2025-06-07 14:09:21 -03:00

150 lines
4.2 KiB
Markdown

# OpenCand Monitoring Setup
This monitoring stack provides comprehensive log aggregation and visualization for the OpenCand project, with special focus on the ETL service.
## Services Overview
### 🔍 **Grafana Loki** (Port 3100)
- **Purpose**: Log aggregation and storage
- **Access**: http://localhost:3100
- **Description**: Collects and stores all container logs in a structured format
### 📊 **Grafana** (Port 3000)
- **Purpose**: Log visualization and dashboards
- **Access**: http://localhost:3000
- **Credentials**:
- Username: `admin`
- Password: `admin`
- **Pre-configured Dashboards**: OpenCand ETL Monitoring dashboard
### 📈 **Prometheus** (Port 9090)
- **Purpose**: Metrics collection and storage
- **Access**: http://localhost:9090
- **Description**: Collects system and application metrics
### 🖥️ **Node Exporter** (Port 9100)
- **Purpose**: System metrics collection
- **Access**: http://localhost:9100/metrics
- **Description**: Provides host system metrics (CPU, memory, disk, etc.)
### 🚚 **Promtail**
- **Purpose**: Log collection agent
- **Description**: Automatically discovers and ships Docker container logs to Loki
## Key Features
### ETL-Specific Monitoring
- ✅ Real-time ETL process logs
- ✅ Error tracking and alerting capabilities
- ✅ Performance metrics monitoring
- ✅ Data processing progress tracking
### Container Log Management
- ✅ Automatic log rotation (10MB max size, 3 files)
- ✅ Structured log labeling
- ✅ Multi-service log aggregation
### Pre-built Dashboards
- ✅ OpenCand ETL Logs viewer
- ✅ API logs monitoring
- ✅ Database logs tracking
- ✅ Container resource usage
## Getting Started
1. **Start the monitoring stack**:
```bash
docker-compose up -d
```
2. **Access Grafana**:
- Open http://localhost:3000
- Login with admin/admin
- Navigate to "Dashboards" → "OpenCand ETL Monitoring"
3. **View ETL Logs in Real-time**:
- In Grafana, go to "Explore"
- Select "Loki" as datasource
- Use query: `{container_name="opencand_etl"}`
4. **Monitor System Metrics**:
- Access Prometheus at http://localhost:9090
- View system metrics from Node Exporter
## Log Queries Examples
### ETL Service Logs
```logql
{container_name="opencand_etl"}
```
### Error Logs Only
```logql
{container_name="opencand_etl"} |= "ERROR"
```
### API Logs with Filtering
```logql
{container_name="opencand_api"} |= "Microsoft.AspNetCore"
```
### Database Connection Logs
```logql
{container_name="opencand_db"} |= "connection"
```
## Configuration Files
- **Loki**: `./monitoring/loki-config.yaml`
- **Promtail**: `./monitoring/promtail-config.yaml`
- **Prometheus**: `./monitoring/prometheus.yml`
- **Grafana Datasources**: `./monitoring/grafana/provisioning/datasources/`
- **Grafana Dashboards**: `./monitoring/grafana/provisioning/dashboards/`
## Data Persistence
The following volumes are created for data persistence:
- `loki-data`: Loki log storage
- `prometheus-data`: Prometheus metrics storage
- `grafana-data`: Grafana dashboards and settings
## Troubleshooting
### ETL Logs Not Appearing
1. Check if ETL container is running: `docker ps`
2. Verify Promtail is collecting logs: `docker logs opencand_promtail`
3. Check Loki status: `curl http://localhost:3100/ready`
### Grafana Dashboard Issues
1. Verify datasources are configured correctly
2. Check if Loki is accessible from Grafana container
3. Restart Grafana container: `docker-compose restart grafana`
### Performance Issues
1. Monitor disk usage for log storage
2. Adjust log retention in `loki-config.yaml`
3. Increase resource limits if needed
## Customization
### Adding More Dashboards
1. Create JSON dashboard files in `./monitoring/grafana/provisioning/dashboards/`
2. Restart Grafana container
### Log Retention Configuration
Edit `./monitoring/loki-config.yaml` to adjust retention policies:
```yaml
limits_config:
retention_period: 168h # 7 days
```
### Alert Configuration
Add alerting rules to Prometheus configuration for ETL failure notifications.
## Security Notes
- Change default Grafana admin password in production
- Restrict network access to monitoring ports
- Consider using authentication for external access
- Regularly update monitoring stack images