OpenCand Monitoring Setup
This monitoring stack provides comprehensive log aggregation and visualization for the OpenCand project, with special focus on the ETL service.
Services Overview
🔍 Grafana Loki (Port 3100)
- Purpose: Log aggregation and storage
- Access: http://localhost:3100
- Description: Collects and stores all container logs in a structured format
📊 Grafana (Port 3000)
- Purpose: Log visualization and dashboards
- Access: http://localhost:3000
- Credentials:
- Username:
admin
- Password:
admin
- Username:
- Pre-configured Dashboards: OpenCand ETL Monitoring dashboard
📈 Prometheus (Port 9090)
- Purpose: Metrics collection and storage
- Access: http://localhost:9090
- Description: Collects system and application metrics
🖥️ Node Exporter (Port 9100)
- Purpose: System metrics collection
- Access: http://localhost:9100/metrics
- Description: Provides host system metrics (CPU, memory, disk, etc.)
🚚 Promtail
- Purpose: Log collection agent
- Description: Automatically discovers and ships Docker container logs to Loki
Key Features
ETL-Specific Monitoring
- ✅ Real-time ETL process logs
- ✅ Error tracking and alerting capabilities
- ✅ Performance metrics monitoring
- ✅ Data processing progress tracking
Container Log Management
- ✅ Automatic log rotation (10MB max size, 3 files)
- ✅ Structured log labeling
- ✅ Multi-service log aggregation
Pre-built Dashboards
- ✅ OpenCand ETL Logs viewer
- ✅ API logs monitoring
- ✅ Database logs tracking
- ✅ Container resource usage
Getting Started
-
Start the monitoring stack:
docker-compose up -d
-
Access Grafana:
- Open http://localhost:3000
- Login with admin/admin
- Navigate to "Dashboards" → "OpenCand ETL Monitoring"
-
View ETL Logs in Real-time:
- In Grafana, go to "Explore"
- Select "Loki" as datasource
- Use query:
{container_name="opencand_etl"}
-
Monitor System Metrics:
- Access Prometheus at http://localhost:9090
- View system metrics from Node Exporter
Log Queries Examples
ETL Service Logs
{container_name="opencand_etl"}
Error Logs Only
{container_name="opencand_etl"} |= "ERROR"
API Logs with Filtering
{container_name="opencand_api"} |= "Microsoft.AspNetCore"
Database Connection Logs
{container_name="opencand_db"} |= "connection"
Configuration Files
- Loki:
./monitoring/loki-config.yaml
- Promtail:
./monitoring/promtail-config.yaml
- Prometheus:
./monitoring/prometheus.yml
- Grafana Datasources:
./monitoring/grafana/provisioning/datasources/
- Grafana Dashboards:
./monitoring/grafana/provisioning/dashboards/
Data Persistence
The following volumes are created for data persistence:
loki-data
: Loki log storageprometheus-data
: Prometheus metrics storagegrafana-data
: Grafana dashboards and settings
Troubleshooting
ETL Logs Not Appearing
- Check if ETL container is running:
docker ps
- Verify Promtail is collecting logs:
docker logs opencand_promtail
- Check Loki status:
curl http://localhost:3100/ready
Grafana Dashboard Issues
- Verify datasources are configured correctly
- Check if Loki is accessible from Grafana container
- Restart Grafana container:
docker-compose restart grafana
Performance Issues
- Monitor disk usage for log storage
- Adjust log retention in
loki-config.yaml
- Increase resource limits if needed
Customization
Adding More Dashboards
- Create JSON dashboard files in
./monitoring/grafana/provisioning/dashboards/
- Restart Grafana container
Log Retention Configuration
Edit ./monitoring/loki-config.yaml
to adjust retention policies:
limits_config:
retention_period: 168h # 7 days
Alert Configuration
Add alerting rules to Prometheus configuration for ETL failure notifications.
Security Notes
- Change default Grafana admin password in production
- Restrict network access to monitoring ports
- Consider using authentication for external access
- Regularly update monitoring stack images