150 lines
4.2 KiB
Markdown
150 lines
4.2 KiB
Markdown
# OpenCand Monitoring Setup
|
|
|
|
This monitoring stack provides comprehensive log aggregation and visualization for the OpenCand project, with special focus on the ETL service.
|
|
|
|
## Services Overview
|
|
|
|
### 🔍 **Grafana Loki** (Port 3100)
|
|
- **Purpose**: Log aggregation and storage
|
|
- **Access**: http://localhost:3100
|
|
- **Description**: Collects and stores all container logs in a structured format
|
|
|
|
### 📊 **Grafana** (Port 3000)
|
|
- **Purpose**: Log visualization and dashboards
|
|
- **Access**: http://localhost:3000
|
|
- **Credentials**:
|
|
- Username: `admin`
|
|
- Password: `admin`
|
|
- **Pre-configured Dashboards**: OpenCand ETL Monitoring dashboard
|
|
|
|
### 📈 **Prometheus** (Port 9090)
|
|
- **Purpose**: Metrics collection and storage
|
|
- **Access**: http://localhost:9090
|
|
- **Description**: Collects system and application metrics
|
|
|
|
### 🖥️ **Node Exporter** (Port 9100)
|
|
- **Purpose**: System metrics collection
|
|
- **Access**: http://localhost:9100/metrics
|
|
- **Description**: Provides host system metrics (CPU, memory, disk, etc.)
|
|
|
|
### 🚚 **Promtail**
|
|
- **Purpose**: Log collection agent
|
|
- **Description**: Automatically discovers and ships Docker container logs to Loki
|
|
|
|
## Key Features
|
|
|
|
### ETL-Specific Monitoring
|
|
- ✅ Real-time ETL process logs
|
|
- ✅ Error tracking and alerting capabilities
|
|
- ✅ Performance metrics monitoring
|
|
- ✅ Data processing progress tracking
|
|
|
|
### Container Log Management
|
|
- ✅ Automatic log rotation (10MB max size, 3 files)
|
|
- ✅ Structured log labeling
|
|
- ✅ Multi-service log aggregation
|
|
|
|
### Pre-built Dashboards
|
|
- ✅ OpenCand ETL Logs viewer
|
|
- ✅ API logs monitoring
|
|
- ✅ Database logs tracking
|
|
- ✅ Container resource usage
|
|
|
|
## Getting Started
|
|
|
|
1. **Start the monitoring stack**:
|
|
```bash
|
|
docker-compose up -d
|
|
```
|
|
|
|
2. **Access Grafana**:
|
|
- Open http://localhost:3000
|
|
- Login with admin/admin
|
|
- Navigate to "Dashboards" → "OpenCand ETL Monitoring"
|
|
|
|
3. **View ETL Logs in Real-time**:
|
|
- In Grafana, go to "Explore"
|
|
- Select "Loki" as datasource
|
|
- Use query: `{container_name="opencand_etl"}`
|
|
|
|
4. **Monitor System Metrics**:
|
|
- Access Prometheus at http://localhost:9090
|
|
- View system metrics from Node Exporter
|
|
|
|
## Log Queries Examples
|
|
|
|
### ETL Service Logs
|
|
```logql
|
|
{container_name="opencand_etl"}
|
|
```
|
|
|
|
### Error Logs Only
|
|
```logql
|
|
{container_name="opencand_etl"} |= "ERROR"
|
|
```
|
|
|
|
### API Logs with Filtering
|
|
```logql
|
|
{container_name="opencand_api"} |= "Microsoft.AspNetCore"
|
|
```
|
|
|
|
### Database Connection Logs
|
|
```logql
|
|
{container_name="opencand_db"} |= "connection"
|
|
```
|
|
|
|
## Configuration Files
|
|
|
|
- **Loki**: `./monitoring/loki-config.yaml`
|
|
- **Promtail**: `./monitoring/promtail-config.yaml`
|
|
- **Prometheus**: `./monitoring/prometheus.yml`
|
|
- **Grafana Datasources**: `./monitoring/grafana/provisioning/datasources/`
|
|
- **Grafana Dashboards**: `./monitoring/grafana/provisioning/dashboards/`
|
|
|
|
## Data Persistence
|
|
|
|
The following volumes are created for data persistence:
|
|
- `loki-data`: Loki log storage
|
|
- `prometheus-data`: Prometheus metrics storage
|
|
- `grafana-data`: Grafana dashboards and settings
|
|
|
|
## Troubleshooting
|
|
|
|
### ETL Logs Not Appearing
|
|
1. Check if ETL container is running: `docker ps`
|
|
2. Verify Promtail is collecting logs: `docker logs opencand_promtail`
|
|
3. Check Loki status: `curl http://localhost:3100/ready`
|
|
|
|
### Grafana Dashboard Issues
|
|
1. Verify datasources are configured correctly
|
|
2. Check if Loki is accessible from Grafana container
|
|
3. Restart Grafana container: `docker-compose restart grafana`
|
|
|
|
### Performance Issues
|
|
1. Monitor disk usage for log storage
|
|
2. Adjust log retention in `loki-config.yaml`
|
|
3. Increase resource limits if needed
|
|
|
|
## Customization
|
|
|
|
### Adding More Dashboards
|
|
1. Create JSON dashboard files in `./monitoring/grafana/provisioning/dashboards/`
|
|
2. Restart Grafana container
|
|
|
|
### Log Retention Configuration
|
|
Edit `./monitoring/loki-config.yaml` to adjust retention policies:
|
|
```yaml
|
|
limits_config:
|
|
retention_period: 168h # 7 days
|
|
```
|
|
|
|
### Alert Configuration
|
|
Add alerting rules to Prometheus configuration for ETL failure notifications.
|
|
|
|
## Security Notes
|
|
|
|
- Change default Grafana admin password in production
|
|
- Restrict network access to monitoring ports
|
|
- Consider using authentication for external access
|
|
- Regularly update monitoring stack images
|