2025-06-07 14:22:28 -03:00
..
2025-06-07 14:22:28 -03:00
2025-06-07 14:22:28 -03:00
2025-06-07 14:22:28 -03:00
2025-06-07 14:09:21 -03:00

OpenCand Monitoring Setup

This monitoring stack provides comprehensive log aggregation and visualization for the OpenCand project, with special focus on the ETL service.

Services Overview

🔍 Grafana Loki (Port 3100)

  • Purpose: Log aggregation and storage
  • Access: http://localhost:3100
  • Description: Collects and stores all container logs in a structured format

📊 Grafana (Port 3000)

  • Purpose: Log visualization and dashboards
  • Access: http://localhost:3000
  • Credentials:
    • Username: admin
    • Password: admin
  • Pre-configured Dashboards: OpenCand ETL Monitoring dashboard

📈 Prometheus (Port 9090)

  • Purpose: Metrics collection and storage
  • Access: http://localhost:9090
  • Description: Collects system and application metrics

🖥️ Node Exporter (Port 9100)

🚚 Promtail

  • Purpose: Log collection agent
  • Description: Automatically discovers and ships Docker container logs to Loki

Key Features

ETL-Specific Monitoring

  • Real-time ETL process logs
  • Error tracking and alerting capabilities
  • Performance metrics monitoring
  • Data processing progress tracking

Container Log Management

  • Automatic log rotation (10MB max size, 3 files)
  • Structured log labeling
  • Multi-service log aggregation

Pre-built Dashboards

  • OpenCand ETL Logs viewer
  • API logs monitoring
  • Database logs tracking
  • Container resource usage

Getting Started

  1. Start the monitoring stack:

    docker-compose up -d
    
  2. Access Grafana:

    • Open http://localhost:3000
    • Login with admin/admin
    • Navigate to "Dashboards" → "OpenCand ETL Monitoring"
  3. View ETL Logs in Real-time:

    • In Grafana, go to "Explore"
    • Select "Loki" as datasource
    • Use query: {container_name="opencand_etl"}
  4. Monitor System Metrics:

Log Queries Examples

ETL Service Logs

{container_name="opencand_etl"}

Error Logs Only

{container_name="opencand_etl"} |= "ERROR"

API Logs with Filtering

{container_name="opencand_api"} |= "Microsoft.AspNetCore"

Database Connection Logs

{container_name="opencand_db"} |= "connection"

Configuration Files

  • Loki: ./monitoring/loki-config.yaml
  • Promtail: ./monitoring/promtail-config.yaml
  • Prometheus: ./monitoring/prometheus.yml
  • Grafana Datasources: ./monitoring/grafana/provisioning/datasources/
  • Grafana Dashboards: ./monitoring/grafana/provisioning/dashboards/

Data Persistence

The following volumes are created for data persistence:

  • loki-data: Loki log storage
  • prometheus-data: Prometheus metrics storage
  • grafana-data: Grafana dashboards and settings

Troubleshooting

ETL Logs Not Appearing

  1. Check if ETL container is running: docker ps
  2. Verify Promtail is collecting logs: docker logs opencand_promtail
  3. Check Loki status: curl http://localhost:3100/ready

Grafana Dashboard Issues

  1. Verify datasources are configured correctly
  2. Check if Loki is accessible from Grafana container
  3. Restart Grafana container: docker-compose restart grafana

Performance Issues

  1. Monitor disk usage for log storage
  2. Adjust log retention in loki-config.yaml
  3. Increase resource limits if needed

Customization

Adding More Dashboards

  1. Create JSON dashboard files in ./monitoring/grafana/provisioning/dashboards/
  2. Restart Grafana container

Log Retention Configuration

Edit ./monitoring/loki-config.yaml to adjust retention policies:

limits_config:
  retention_period: 168h  # 7 days

Alert Configuration

Add alerting rules to Prometheus configuration for ETL failure notifications.

Security Notes

  • Change default Grafana admin password in production
  • Restrict network access to monitoring ports
  • Consider using authentication for external access
  • Regularly update monitoring stack images