Monitoring OpenResty Edge

This document provides a comprehensive monitoring solution for OpenResty Edge, helping you track system status in real-time and promptly identify and resolve potential issues.

Monitoring Architecture Overview

The following aspects of OpenResty Edge need to be monitored, listed in order of priority:

  1. Configuration Synchronization Monitoring: Ensure configurations are correctly synchronized to all nodes
  2. Application-level Monitoring: Monitor the status of Edge Admin, Edge Node, Edge Log Server, and other components
  3. System-level Monitoring: CPU, memory, disk, network, and other basic resource monitoring
  4. Error Log Monitoring: Error log collection and analysis
  5. Business-level Monitoring: Request volume, response time, error rate, and other business metrics

Message Notifications in OpenResty Edge

OpenResty Edge has a built-in powerful notification mechanism that can push system status changes directly to external systems.

Configuration Documentation References

Supported Alert Types

Supports the following types of message notifications, including but not limited to:

  • Gateway node CPU usage greater than 80%
  • Gateway node memory usage greater than 90%
  • Gateway node log disk usage greater than 90%
  • Gateway node heartbeat status changes
  • Gateway node health check status changes
  • New configuration deployment

Synchronization Status Monitoring

Configuration synchronization is a core function of OpenResty Edge. Monitoring the synchronization status is essential for ensuring normal system operation.

Prometheus Metrics Monitoring

  • Enable Prometheus Metrics: Enable Prometheus metrics in Edge Admin

    Configure trusted IPs, path: Edge Admin > Global Configuration > Global Metrics > “OpenResty Edge Admin” tab:

  • Metrics Output Example:

    # HELP service_status Service status
    # TYPE service_status gauge
    service_status{type="log_server_offline"} 0 1741597233000
    service_status{type="log_server_db_offline"} 0 1741597233000
    # HELP config_sync_delay Configuration sync delay
    # TYPE config_sync_delay gauge
    config_sync_delay{hostname="oredge-node-1",internal_ip="172.17.0.6",external_ip="172.17.0.6"} 6 1741597233000
    

    In this example, the Edge Node named “oredge-node-1” is 6 configuration changes behind.

  • Alert Rule Example:

    • config_sync_delay > 100 Alert when configuration delay exceeds threshold 100 for 5 consecutive minutes

Database Monitoring

OpenResty Edge relies on PostgreSQL database. The following measures are recommended:

High Availability Configuration

Prioritize database high availability configuration. You can refer to the following two documents:

Key Monitoring Metrics

  • Connection Count: Monitor the ratio of active connections to maximum connections
  • System Resources: CPU usage, disk space usage, memory usage

Database Maintenance

  • Regular Backups: Daily full backup of Edge Admin database, regular restoration testing in test environment
    • Note: Edge Log Server database does not contain main configurations, cold backups may not be necessary

System Load Monitoring

Monitor the system load of all components of OpenResty Edge to ensure timely detection of faulty services.

Basic Resource Monitoring

Monitoring ItemWarning ThresholdAlert ThresholdMonitoring Method
CPU Usage80%90%Cloud platform monitoring or custom scripts
Memory Usage85%95%Cloud platform monitoring or custom scripts
Disk Usage80%90%Cloud platform monitoring or custom scripts
Network Bandwidth Usage80%90%Cloud platform monitoring or custom scripts

You can use monitoring services provided by AWS, GCP, Alibaba Cloud, Azure, or other cloud providers, or write your own scripts for periodic monitoring.

Error Log Monitoring

OpenResty Edge collects its own error logs, which you can view through Edge Admin > Dashboard > Error Logs. However, OpenResty Edge currently does not provide an alert mechanism for its own errors, so additional alerting is needed for uncommon errors in log files.

Log Collection Paths

Use tools like Filebeat to collect logs from the following paths:

ComponentError Log Path
Edge Admin/usr/local/oredge-admin/logs/error.log
Edge Node/usr/local/oredge-node/logs/error.log
Edge Log Server/usr/local/oredge-log-server/logs/error.log

Log Alert Strategy

  • Filter out logs generated by normal operations such as log rotation and configuration hot updates
  • Focus on and alert for exceptional situations such as stack traces

Business Monitoring

You can use the following two methods to monitor your business:

Access Log Analysis

Collect and analyze access logs from the following paths:

ComponentAccess Log Path
Edge Admin/usr/local/oredge-admin/logs/access.log
Edge Node/usr/local/oredge-node/logs/access.log
Edge Log Server/usr/local/oredge-log-server/logs/access.log

Analyze QPS, request status distribution, URI distribution, etc. through access logs.

Dynamic Metrics Monitoring

Enable dynamic metrics in OpenResty Edge as needed.

Note: Enabling dynamic metrics may cause the Edge Log Server database to consume a large amount of disk space. Please ensure that the device hosting the Edge Log Server database has sufficient disk space.

Common Troubleshooting Guide

Configuration Synchronization Issues

Symptom: Edge Node configuration synchronization delay is large

Possible Causes:

  • Unstable network connection between Edge Node and Edge Admin
  • High load on Edge Admin database
  • High load on Edge Node, slow processing of synchronization requests
  • Edge Node offline for too long, losing synchronization
  • Edge Admin configuration changes too quickly, incremental sync configurations are cleared, Edge Node fails to perform incremental synchronization

Troubleshooting Steps:

  1. Check Edge Admin and Edge Node logs
  2. Check network connection status
  3. Check database performance metrics

Database Issues

Symptom: Edge Admin or Edge Log Server responds slowly

Possible Causes:

  • Large database table data volume causing query performance degradation
  • Poor network connection status
  • Insufficient database space
  • Too many connections
  • PostgreSQL database is performing space optimization tasks

Troubleshooting Steps:

  1. Check Edge Admin and Edge Node logs
  2. Use psql to log into the database and analyze table space usage
  3. Check disk space usage