# OpenCand Monitoring Setup This monitoring stack provides comprehensive log aggregation and visualization for the OpenCand project, with special focus on the ETL service. ## Services Overview ### 🔍 **Grafana Loki** (Port 3100) - **Purpose**: Log aggregation and storage - **Access**: http://localhost:3100 - **Description**: Collects and stores all container logs in a structured format ### 📊 **Grafana** (Port 3000) - **Purpose**: Log visualization and dashboards - **Access**: http://localhost:3000 - **Credentials**: - Username: `admin` - Password: `admin` - **Pre-configured Dashboards**: OpenCand ETL Monitoring dashboard ### 📈 **Prometheus** (Port 9090) - **Purpose**: Metrics collection and storage - **Access**: http://localhost:9090 - **Description**: Collects system and application metrics ### 🖥️ **Node Exporter** (Port 9100) - **Purpose**: System metrics collection - **Access**: http://localhost:9100/metrics - **Description**: Provides host system metrics (CPU, memory, disk, etc.) ### 🚚 **Promtail** - **Purpose**: Log collection agent - **Description**: Automatically discovers and ships Docker container logs to Loki ## Key Features ### ETL-Specific Monitoring - ✅ Real-time ETL process logs - ✅ Error tracking and alerting capabilities - ✅ Performance metrics monitoring - ✅ Data processing progress tracking ### Container Log Management - ✅ Automatic log rotation (10MB max size, 3 files) - ✅ Structured log labeling - ✅ Multi-service log aggregation ### Pre-built Dashboards - ✅ OpenCand ETL Logs viewer - ✅ API logs monitoring - ✅ Database logs tracking - ✅ Container resource usage ## Getting Started 1. **Start the monitoring stack**: ```bash docker-compose up -d ``` 2. **Access Grafana**: - Open http://localhost:3000 - Login with admin/admin - Navigate to "Dashboards" → "OpenCand ETL Monitoring" 3. **View ETL Logs in Real-time**: - In Grafana, go to "Explore" - Select "Loki" as datasource - Use query: `{container_name="opencand_etl"}` 4. **Monitor System Metrics**: - Access Prometheus at http://localhost:9090 - View system metrics from Node Exporter ## Log Queries Examples ### ETL Service Logs ```logql {container_name="opencand_etl"} ``` ### Error Logs Only ```logql {container_name="opencand_etl"} |= "ERROR" ``` ### API Logs with Filtering ```logql {container_name="opencand_api"} |= "Microsoft.AspNetCore" ``` ### Database Connection Logs ```logql {container_name="opencand_db"} |= "connection" ``` ## Configuration Files - **Loki**: `./monitoring/loki-config.yaml` - **Promtail**: `./monitoring/promtail-config.yaml` - **Prometheus**: `./monitoring/prometheus.yml` - **Grafana Datasources**: `./monitoring/grafana/provisioning/datasources/` - **Grafana Dashboards**: `./monitoring/grafana/provisioning/dashboards/` ## Data Persistence The following volumes are created for data persistence: - `loki-data`: Loki log storage - `prometheus-data`: Prometheus metrics storage - `grafana-data`: Grafana dashboards and settings ## Troubleshooting ### ETL Logs Not Appearing 1. Check if ETL container is running: `docker ps` 2. Verify Promtail is collecting logs: `docker logs opencand_promtail` 3. Check Loki status: `curl http://localhost:3100/ready` ### Grafana Dashboard Issues 1. Verify datasources are configured correctly 2. Check if Loki is accessible from Grafana container 3. Restart Grafana container: `docker-compose restart grafana` ### Performance Issues 1. Monitor disk usage for log storage 2. Adjust log retention in `loki-config.yaml` 3. Increase resource limits if needed ## Customization ### Adding More Dashboards 1. Create JSON dashboard files in `./monitoring/grafana/provisioning/dashboards/` 2. Restart Grafana container ### Log Retention Configuration Edit `./monitoring/loki-config.yaml` to adjust retention policies: ```yaml limits_config: retention_period: 168h # 7 days ``` ### Alert Configuration Add alerting rules to Prometheus configuration for ETL failure notifications. ## Security Notes - Change default Grafana admin password in production - Restrict network access to monitoring ports - Consider using authentication for external access - Regularly update monitoring stack images