feat: додано Node Registry, GreenFood, Monitoring та Utils
This commit is contained in:
225
monitoring/README.md
Normal file
225
monitoring/README.md
Normal file
@@ -0,0 +1,225 @@
|
||||
# DAARION Platform Monitoring
|
||||
|
||||
**Stack**: Prometheus + Grafana
|
||||
**Сервер**: `144.76.224.179`
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Швидкий старт
|
||||
|
||||
### 1. Деплой на сервер
|
||||
|
||||
```bash
|
||||
# З локальної машини
|
||||
cd /Users/apple/github-projects/microdao-daarion
|
||||
rsync -avz monitoring/ root@144.76.224.179:/opt/microdao-daarion/monitoring/
|
||||
|
||||
# На сервері
|
||||
ssh root@144.76.224.179
|
||||
cd /opt/microdao-daarion/monitoring
|
||||
docker-compose -f docker-compose.monitoring.yml up -d
|
||||
```
|
||||
|
||||
### 2. Доступ до інтерфейсів
|
||||
|
||||
- **Prometheus**: http://144.76.224.179:9090
|
||||
- **Grafana**: http://144.76.224.179:3000
|
||||
- Username: `admin`
|
||||
- Password: `daarion2025`
|
||||
|
||||
---
|
||||
|
||||
## 📊 Що моніториться?
|
||||
|
||||
### Core Services
|
||||
- **dagi-router** (9102) - Центральний маршрутизатор
|
||||
- **telegram-gateway** (8000) - Telegram боти
|
||||
- **dagi-gateway** (9300) - HTTP Gateway
|
||||
- **dagi-rbac** (9200) - RBAC Service
|
||||
|
||||
### AI/ML Services
|
||||
- **dagi-crewai** (9010) - CrewAI workflows
|
||||
- **dagi-vision-encoder** (8001) - Vision AI
|
||||
- **dagi-parser** (9400) - OCR/PDF parsing
|
||||
- **dagi-stt** (9000) - Speech-to-Text
|
||||
- **dagi-tts** (9101) - Text-to-Speech
|
||||
|
||||
### Infrastructure
|
||||
- **nats** (8222) - Message broker
|
||||
- **dagi-qdrant** (6333) - Vector DB
|
||||
- **dagi-postgres** (5432) - Main DB
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Ключові метрики
|
||||
|
||||
### 1. Request Rate
|
||||
```promql
|
||||
rate(http_requests_total[5m])
|
||||
```
|
||||
|
||||
### 2. Error Rate
|
||||
```promql
|
||||
rate(http_requests_total{status=~"5.."}[5m])
|
||||
```
|
||||
|
||||
### 3. Latency (p95)
|
||||
```promql
|
||||
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
|
||||
```
|
||||
|
||||
### 4. LLM Performance
|
||||
```promql
|
||||
rate(llm_requests_total[5m])
|
||||
histogram_quantile(0.95, rate(llm_request_duration_seconds_bucket[5m]))
|
||||
```
|
||||
|
||||
### 5. Telegram Activity
|
||||
```promql
|
||||
rate(telegram_messages_total[5m])
|
||||
telegram_active_chats
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🚨 Alerts
|
||||
|
||||
### Critical
|
||||
- **ServiceDown**: Сервіс не відповідає > 2 хв
|
||||
- **TelegramGatewayDown**: Telegram боти не працюють
|
||||
- **PostgreSQLDown**: База даних недоступна
|
||||
- **NATSDown**: Message broker недоступний
|
||||
- **DiskSpaceCritical**: < 10% диску
|
||||
|
||||
### Warning
|
||||
- **HighErrorRate**: > 5% помилок
|
||||
- **RouterHighLatency**: P95 > 10s
|
||||
- **LLMHighLatency**: P95 > 30s
|
||||
- **DiskSpaceWarning**: < 20% диску
|
||||
|
||||
---
|
||||
|
||||
## 📈 Додавання метрик до сервісу
|
||||
|
||||
### Python (FastAPI)
|
||||
|
||||
```python
|
||||
from prometheus_client import Counter, Histogram, generate_latest
|
||||
from fastapi import FastAPI
|
||||
|
||||
app = FastAPI()
|
||||
|
||||
# Metrics
|
||||
REQUEST_COUNT = Counter('http_requests_total', 'Total HTTP requests', ['method', 'endpoint', 'status'])
|
||||
REQUEST_LATENCY = Histogram('http_request_duration_seconds', 'HTTP request latency', ['method', 'endpoint'])
|
||||
|
||||
@app.middleware("http")
|
||||
async def metrics_middleware(request: Request, call_next):
|
||||
start_time = time.time()
|
||||
response = await call_next(request)
|
||||
duration = time.time() - start_time
|
||||
|
||||
REQUEST_COUNT.labels(
|
||||
method=request.method,
|
||||
endpoint=request.url.path,
|
||||
status=response.status_code
|
||||
).inc()
|
||||
|
||||
REQUEST_LATENCY.labels(
|
||||
method=request.method,
|
||||
endpoint=request.url.path
|
||||
).observe(duration)
|
||||
|
||||
return response
|
||||
|
||||
@app.get("/metrics")
|
||||
async def metrics():
|
||||
return Response(generate_latest(), media_type="text/plain")
|
||||
```
|
||||
|
||||
### Додати сервіс в Prometheus
|
||||
|
||||
Відредагувати `monitoring/prometheus/prometheus.yml`:
|
||||
|
||||
```yaml
|
||||
scrape_configs:
|
||||
- job_name: 'my-new-service'
|
||||
static_configs:
|
||||
- targets: ['my-service:9999']
|
||||
metrics_path: '/metrics'
|
||||
scrape_interval: 15s
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🛠️ Troubleshooting
|
||||
|
||||
### Prometheus не скрейпить метрики
|
||||
|
||||
```bash
|
||||
# Перевірити статус targets
|
||||
curl http://localhost:9090/api/v1/targets
|
||||
|
||||
# Перевірити logs
|
||||
docker logs dagi-prometheus
|
||||
|
||||
# Перевірити endpoint вручну
|
||||
curl http://dagi-router:9102/metrics
|
||||
```
|
||||
|
||||
### Grafana не показує дані
|
||||
|
||||
```bash
|
||||
# Перевірити datasource
|
||||
docker exec dagi-grafana grafana-cli admin reset-admin-password daarion2025
|
||||
|
||||
# Restart Grafana
|
||||
docker restart dagi-grafana
|
||||
```
|
||||
|
||||
### Reload Prometheus config без рестарту
|
||||
|
||||
```bash
|
||||
curl -X POST http://localhost:9090/-/reload
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📚 Корисні запити
|
||||
|
||||
### Top 10 найповільніших endpoints
|
||||
```promql
|
||||
topk(10, histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])))
|
||||
```
|
||||
|
||||
### Error rate по сервісах
|
||||
```promql
|
||||
sum(rate(http_requests_total{status=~"5.."}[5m])) by (job)
|
||||
```
|
||||
|
||||
### LLM requests per second
|
||||
```promql
|
||||
sum(rate(llm_requests_total[1m])) by (agent_id)
|
||||
```
|
||||
|
||||
### Active Telegram chats
|
||||
```promql
|
||||
sum(telegram_active_chats) by (agent_id)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Наступні кроки
|
||||
|
||||
1. ✅ Prometheus + Grafana встановлено
|
||||
2. ⏳ Додати метрики в DAGI Router
|
||||
3. ⏳ Додати метрики в Telegram Gateway
|
||||
4. ⏳ Створити дашборди в Grafana
|
||||
5. ⏳ Налаштувати Alertmanager (Slack/Telegram notifications)
|
||||
6. ⏳ Додати Loki для централізованих логів
|
||||
7. ⏳ Додати Jaeger для distributed tracing
|
||||
|
||||
---
|
||||
|
||||
*Оновлено: 2025-11-18*
|
||||
|
||||
64
monitoring/docker-compose.monitoring.yml
Normal file
64
monitoring/docker-compose.monitoring.yml
Normal file
@@ -0,0 +1,64 @@
|
||||
version: '3.8'
|
||||
|
||||
services:
|
||||
prometheus:
|
||||
image: prom/prometheus:latest
|
||||
container_name: dagi-prometheus
|
||||
ports:
|
||||
- "9090:9090"
|
||||
volumes:
|
||||
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
|
||||
- ./prometheus/alerts:/etc/prometheus/alerts:ro
|
||||
- prometheus-data:/prometheus
|
||||
command:
|
||||
- '--config.file=/etc/prometheus/prometheus.yml'
|
||||
- '--storage.tsdb.path=/prometheus'
|
||||
- '--web.console.libraries=/usr/share/prometheus/console_libraries'
|
||||
- '--web.console.templates=/usr/share/prometheus/consoles'
|
||||
- '--web.enable-lifecycle'
|
||||
networks:
|
||||
- dagi-network
|
||||
restart: unless-stopped
|
||||
healthcheck:
|
||||
test: ["CMD", "wget", "--quiet", "--tries=1", "--spider", "http://localhost:9090/-/healthy"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
|
||||
grafana:
|
||||
image: grafana/grafana:latest
|
||||
container_name: dagi-grafana
|
||||
ports:
|
||||
- "3000:3000"
|
||||
volumes:
|
||||
- grafana-data:/var/lib/grafana
|
||||
- ./grafana/dashboards:/etc/grafana/provisioning/dashboards:ro
|
||||
- ./grafana/datasources:/etc/grafana/provisioning/datasources:ro
|
||||
environment:
|
||||
- GF_SECURITY_ADMIN_USER=admin
|
||||
- GF_SECURITY_ADMIN_PASSWORD=daarion2025
|
||||
- GF_USERS_ALLOW_SIGN_UP=false
|
||||
- GF_SERVER_ROOT_URL=http://localhost:3000
|
||||
- GF_ANALYTICS_REPORTING_ENABLED=false
|
||||
- GF_ANALYTICS_CHECK_FOR_UPDATES=false
|
||||
networks:
|
||||
- dagi-network
|
||||
restart: unless-stopped
|
||||
depends_on:
|
||||
- prometheus
|
||||
healthcheck:
|
||||
test: ["CMD", "wget", "--quiet", "--tries=1", "--spider", "http://localhost:3000/api/health"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
|
||||
networks:
|
||||
dagi-network:
|
||||
external: true
|
||||
|
||||
volumes:
|
||||
prometheus-data:
|
||||
driver: local
|
||||
grafana-data:
|
||||
driver: local
|
||||
|
||||
462
monitoring/grafana/dashboards/daarion_services_overview.json
Normal file
462
monitoring/grafana/dashboards/daarion_services_overview.json
Normal file
@@ -0,0 +1,462 @@
|
||||
{
|
||||
"annotations": {
|
||||
"list": [
|
||||
{
|
||||
"builtIn": 1,
|
||||
"datasource": {
|
||||
"type": "grafana",
|
||||
"uid": "-- Grafana --"
|
||||
},
|
||||
"enable": true,
|
||||
"hide": true,
|
||||
"iconColor": "rgba(0, 211, 255, 1)",
|
||||
"name": "Annotations & Alerts",
|
||||
"type": "dashboard"
|
||||
}
|
||||
]
|
||||
},
|
||||
"editable": true,
|
||||
"fiscalYearStartMonth": 0,
|
||||
"graphTooltip": 0,
|
||||
"id": null,
|
||||
"links": [],
|
||||
"liveNow": false,
|
||||
"panels": [
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "prometheus"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "palette-classic"
|
||||
},
|
||||
"custom": {
|
||||
"axisCenteredZero": false,
|
||||
"axisColorMode": "text",
|
||||
"axisLabel": "",
|
||||
"axisPlacement": "auto",
|
||||
"barAlignment": 0,
|
||||
"drawStyle": "line",
|
||||
"fillOpacity": 10,
|
||||
"gradientMode": "none",
|
||||
"hideFrom": {
|
||||
"tooltip": false,
|
||||
"viz": false,
|
||||
"legend": false
|
||||
},
|
||||
"lineInterpolation": "linear",
|
||||
"lineWidth": 1,
|
||||
"pointSize": 5,
|
||||
"scaleDistribution": {
|
||||
"type": "linear"
|
||||
},
|
||||
"showPoints": "never",
|
||||
"spanNulls": false,
|
||||
"stacking": {
|
||||
"group": "A",
|
||||
"mode": "none"
|
||||
},
|
||||
"thresholdsStyle": {
|
||||
"mode": "off"
|
||||
}
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 80
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "reqps"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 0
|
||||
},
|
||||
"id": 1,
|
||||
"options": {
|
||||
"legend": {
|
||||
"calcs": [],
|
||||
"displayMode": "list",
|
||||
"placement": "bottom",
|
||||
"showLegend": true
|
||||
},
|
||||
"tooltip": {
|
||||
"mode": "single",
|
||||
"sort": "none"
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "prometheus"
|
||||
},
|
||||
"expr": "rate(http_requests_total[5m])",
|
||||
"legendFormat": "{{job}} - {{method}} {{endpoint}}",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"title": "HTTP Requests/sec",
|
||||
"type": "timeseries"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "prometheus"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 0.05
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "percentunit"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 0
|
||||
},
|
||||
"id": 2,
|
||||
"options": {
|
||||
"orientation": "auto",
|
||||
"reduceOptions": {
|
||||
"values": false,
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": ""
|
||||
},
|
||||
"showThresholdLabels": false,
|
||||
"showThresholdMarkers": true
|
||||
},
|
||||
"pluginVersion": "9.5.3",
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "prometheus"
|
||||
},
|
||||
"expr": "rate(http_requests_total{status_code=~\"5..\"}[5m]) / rate(http_requests_total[5m])",
|
||||
"legendFormat": "{{job}}",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"title": "Error Rate (%)",
|
||||
"type": "gauge"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "prometheus"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "palette-classic"
|
||||
},
|
||||
"custom": {
|
||||
"axisCenteredZero": false,
|
||||
"axisColorMode": "text",
|
||||
"axisLabel": "",
|
||||
"axisPlacement": "auto",
|
||||
"barAlignment": 0,
|
||||
"drawStyle": "line",
|
||||
"fillOpacity": 10,
|
||||
"gradientMode": "none",
|
||||
"hideFrom": {
|
||||
"tooltip": false,
|
||||
"viz": false,
|
||||
"legend": false
|
||||
},
|
||||
"lineInterpolation": "linear",
|
||||
"lineWidth": 1,
|
||||
"pointSize": 5,
|
||||
"scaleDistribution": {
|
||||
"type": "linear"
|
||||
},
|
||||
"showPoints": "never",
|
||||
"spanNulls": false,
|
||||
"stacking": {
|
||||
"group": "A",
|
||||
"mode": "none"
|
||||
},
|
||||
"thresholdsStyle": {
|
||||
"mode": "off"
|
||||
}
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 80
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "s"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 8
|
||||
},
|
||||
"id": 3,
|
||||
"options": {
|
||||
"legend": {
|
||||
"calcs": [
|
||||
"mean",
|
||||
"max"
|
||||
],
|
||||
"displayMode": "table",
|
||||
"placement": "right",
|
||||
"showLegend": true
|
||||
},
|
||||
"tooltip": {
|
||||
"mode": "multi",
|
||||
"sort": "none"
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "prometheus"
|
||||
},
|
||||
"expr": "histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))",
|
||||
"legendFormat": "p95 - {{job}}",
|
||||
"refId": "A"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "prometheus"
|
||||
},
|
||||
"expr": "histogram_quantile(0.50, rate(http_request_duration_seconds_bucket[5m]))",
|
||||
"legendFormat": "p50 - {{job}}",
|
||||
"refId": "B"
|
||||
}
|
||||
],
|
||||
"title": "Request Duration (p50, p95)",
|
||||
"type": "timeseries"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "prometheus"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "short"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 8
|
||||
},
|
||||
"id": 4,
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "auto",
|
||||
"orientation": "auto",
|
||||
"reduceOptions": {
|
||||
"values": false,
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": ""
|
||||
},
|
||||
"textMode": "auto"
|
||||
},
|
||||
"pluginVersion": "9.5.3",
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "prometheus"
|
||||
},
|
||||
"expr": "count(up{job=~\"dagi-.*|telegram-gateway\"} == 1)",
|
||||
"legendFormat": "Active Services",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"title": "Active Services",
|
||||
"type": "stat"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "prometheus"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "palette-classic"
|
||||
},
|
||||
"custom": {
|
||||
"axisCenteredZero": false,
|
||||
"axisColorMode": "text",
|
||||
"axisLabel": "",
|
||||
"axisPlacement": "auto",
|
||||
"barAlignment": 0,
|
||||
"drawStyle": "line",
|
||||
"fillOpacity": 10,
|
||||
"gradientMode": "none",
|
||||
"hideFrom": {
|
||||
"tooltip": false,
|
||||
"viz": false,
|
||||
"legend": false
|
||||
},
|
||||
"lineInterpolation": "linear",
|
||||
"lineWidth": 1,
|
||||
"pointSize": 5,
|
||||
"scaleDistribution": {
|
||||
"type": "linear"
|
||||
},
|
||||
"showPoints": "never",
|
||||
"spanNulls": false,
|
||||
"stacking": {
|
||||
"group": "A",
|
||||
"mode": "none"
|
||||
},
|
||||
"thresholdsStyle": {
|
||||
"mode": "off"
|
||||
}
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "reqps"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 16
|
||||
},
|
||||
"id": 5,
|
||||
"options": {
|
||||
"legend": {
|
||||
"calcs": [
|
||||
"mean",
|
||||
"last"
|
||||
],
|
||||
"displayMode": "table",
|
||||
"placement": "right",
|
||||
"showLegend": true
|
||||
},
|
||||
"tooltip": {
|
||||
"mode": "multi",
|
||||
"sort": "desc"
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "prometheus"
|
||||
},
|
||||
"expr": "rate(http_requests_total{job=\"dagi-router\"}[5m])",
|
||||
"legendFormat": "Router - {{method}} {{endpoint}} [{{status_code}}]",
|
||||
"refId": "A"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "prometheus"
|
||||
},
|
||||
"expr": "rate(http_requests_total{job=\"telegram-gateway\"}[5m])",
|
||||
"legendFormat": "Gateway - {{method}} {{endpoint}} [{{status_code}}]",
|
||||
"refId": "B"
|
||||
}
|
||||
],
|
||||
"title": "Requests by Service & Endpoint",
|
||||
"type": "timeseries"
|
||||
}
|
||||
],
|
||||
"refresh": "5s",
|
||||
"schemaVersion": 38,
|
||||
"style": "dark",
|
||||
"tags": [
|
||||
"daarion",
|
||||
"microdao"
|
||||
],
|
||||
"templating": {
|
||||
"list": []
|
||||
},
|
||||
"time": {
|
||||
"from": "now-15m",
|
||||
"to": "now"
|
||||
},
|
||||
"timepicker": {},
|
||||
"timezone": "",
|
||||
"title": "DAARION Services Overview",
|
||||
"uid": "daarion-services",
|
||||
"version": 0,
|
||||
"weekStart": ""
|
||||
}
|
||||
|
||||
14
monitoring/grafana/dashboards/dashboard.yml
Normal file
14
monitoring/grafana/dashboards/dashboard.yml
Normal file
@@ -0,0 +1,14 @@
|
||||
apiVersion: 1
|
||||
|
||||
providers:
|
||||
- name: 'DAARION Dashboards'
|
||||
orgId: 1
|
||||
folder: ''
|
||||
type: file
|
||||
disableDeletion: false
|
||||
updateIntervalSeconds: 10
|
||||
allowUiUpdates: true
|
||||
options:
|
||||
path: /etc/grafana/provisioning/dashboards
|
||||
foldersFromFilesStructure: true
|
||||
|
||||
557
monitoring/grafana/dashboards/telegram_bots.json
Normal file
557
monitoring/grafana/dashboards/telegram_bots.json
Normal file
@@ -0,0 +1,557 @@
|
||||
{
|
||||
"annotations": {
|
||||
"list": []
|
||||
},
|
||||
"editable": true,
|
||||
"fiscalYearStartMonth": 0,
|
||||
"graphTooltip": 0,
|
||||
"id": null,
|
||||
"links": [],
|
||||
"liveNow": false,
|
||||
"panels": [
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "prometheus"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [
|
||||
{
|
||||
"options": {
|
||||
"0": {
|
||||
"color": "red",
|
||||
"index": 0,
|
||||
"text": "Down"
|
||||
},
|
||||
"1": {
|
||||
"color": "green",
|
||||
"index": 1,
|
||||
"text": "Up"
|
||||
}
|
||||
},
|
||||
"type": "value"
|
||||
}
|
||||
],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "red",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "green",
|
||||
"value": 1
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 6,
|
||||
"w": 8,
|
||||
"x": 0,
|
||||
"y": 0
|
||||
},
|
||||
"id": 1,
|
||||
"options": {
|
||||
"colorMode": "background",
|
||||
"graphMode": "none",
|
||||
"justifyMode": "center",
|
||||
"orientation": "auto",
|
||||
"reduceOptions": {
|
||||
"values": false,
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": ""
|
||||
},
|
||||
"textMode": "value_and_name"
|
||||
},
|
||||
"pluginVersion": "9.5.3",
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "prometheus"
|
||||
},
|
||||
"expr": "up{job=\"telegram-gateway\"}",
|
||||
"legendFormat": "Gateway",
|
||||
"refId": "A"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "prometheus"
|
||||
},
|
||||
"expr": "up{job=\"dagi-stt\"}",
|
||||
"legendFormat": "STT",
|
||||
"refId": "B"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "prometheus"
|
||||
},
|
||||
"expr": "up{job=\"dagi-tts\"}",
|
||||
"legendFormat": "TTS",
|
||||
"refId": "C"
|
||||
}
|
||||
],
|
||||
"title": "Service Status",
|
||||
"type": "stat"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "prometheus"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "palette-classic"
|
||||
},
|
||||
"custom": {
|
||||
"axisCenteredZero": false,
|
||||
"axisColorMode": "text",
|
||||
"axisLabel": "",
|
||||
"axisPlacement": "auto",
|
||||
"barAlignment": 0,
|
||||
"drawStyle": "line",
|
||||
"fillOpacity": 10,
|
||||
"gradientMode": "none",
|
||||
"hideFrom": {
|
||||
"tooltip": false,
|
||||
"viz": false,
|
||||
"legend": false
|
||||
},
|
||||
"lineInterpolation": "linear",
|
||||
"lineWidth": 1,
|
||||
"pointSize": 5,
|
||||
"scaleDistribution": {
|
||||
"type": "linear"
|
||||
},
|
||||
"showPoints": "never",
|
||||
"spanNulls": false,
|
||||
"stacking": {
|
||||
"group": "A",
|
||||
"mode": "none"
|
||||
},
|
||||
"thresholdsStyle": {
|
||||
"mode": "off"
|
||||
}
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "reqps"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 6,
|
||||
"w": 16,
|
||||
"x": 8,
|
||||
"y": 0
|
||||
},
|
||||
"id": 2,
|
||||
"options": {
|
||||
"legend": {
|
||||
"calcs": [
|
||||
"mean",
|
||||
"last"
|
||||
],
|
||||
"displayMode": "table",
|
||||
"placement": "right",
|
||||
"showLegend": true
|
||||
},
|
||||
"tooltip": {
|
||||
"mode": "multi",
|
||||
"sort": "none"
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "prometheus"
|
||||
},
|
||||
"expr": "rate(http_requests_total{job=\"telegram-gateway\",endpoint=\"/telegram/webhook\"}[5m])",
|
||||
"legendFormat": "Incoming Messages",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"title": "Telegram Messages Rate",
|
||||
"type": "timeseries"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "prometheus"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "palette-classic"
|
||||
},
|
||||
"custom": {
|
||||
"axisCenteredZero": false,
|
||||
"axisColorMode": "text",
|
||||
"axisLabel": "",
|
||||
"axisPlacement": "auto",
|
||||
"barAlignment": 0,
|
||||
"drawStyle": "line",
|
||||
"fillOpacity": 10,
|
||||
"gradientMode": "none",
|
||||
"hideFrom": {
|
||||
"tooltip": false,
|
||||
"viz": false,
|
||||
"legend": false
|
||||
},
|
||||
"lineInterpolation": "linear",
|
||||
"lineWidth": 1,
|
||||
"pointSize": 5,
|
||||
"scaleDistribution": {
|
||||
"type": "linear"
|
||||
},
|
||||
"showPoints": "never",
|
||||
"spanNulls": false,
|
||||
"stacking": {
|
||||
"group": "A",
|
||||
"mode": "none"
|
||||
},
|
||||
"thresholdsStyle": {
|
||||
"mode": "off"
|
||||
}
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "s"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 6
|
||||
},
|
||||
"id": 3,
|
||||
"options": {
|
||||
"legend": {
|
||||
"calcs": [
|
||||
"mean",
|
||||
"max"
|
||||
],
|
||||
"displayMode": "table",
|
||||
"placement": "right",
|
||||
"showLegend": true
|
||||
},
|
||||
"tooltip": {
|
||||
"mode": "multi",
|
||||
"sort": "desc"
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "prometheus"
|
||||
},
|
||||
"expr": "histogram_quantile(0.95, rate(http_request_duration_seconds_bucket{job=\"dagi-router\"}[5m]))",
|
||||
"legendFormat": "Router p95",
|
||||
"refId": "A"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "prometheus"
|
||||
},
|
||||
"expr": "histogram_quantile(0.95, rate(http_request_duration_seconds_bucket{job=\"telegram-gateway\"}[5m]))",
|
||||
"legendFormat": "Gateway p95",
|
||||
"refId": "B"
|
||||
}
|
||||
],
|
||||
"title": "Response Time (p95)",
|
||||
"type": "timeseries"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "prometheus"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "palette-classic"
|
||||
},
|
||||
"custom": {
|
||||
"hideFrom": {
|
||||
"tooltip": false,
|
||||
"viz": false,
|
||||
"legend": false
|
||||
}
|
||||
},
|
||||
"mappings": []
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 6
|
||||
},
|
||||
"id": 4,
|
||||
"options": {
|
||||
"legend": {
|
||||
"displayMode": "table",
|
||||
"placement": "right",
|
||||
"showLegend": true,
|
||||
"values": [
|
||||
"value"
|
||||
]
|
||||
},
|
||||
"pieType": "pie",
|
||||
"tooltip": {
|
||||
"mode": "single",
|
||||
"sort": "none"
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "prometheus"
|
||||
},
|
||||
"expr": "sum by (status_code) (increase(http_requests_total{job=\"telegram-gateway\"}[1h]))",
|
||||
"legendFormat": "{{status_code}}",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"title": "HTTP Status Codes (1h)",
|
||||
"type": "piechart"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "prometheus"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "short"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 8,
|
||||
"x": 0,
|
||||
"y": 14
|
||||
},
|
||||
"id": 5,
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "auto",
|
||||
"orientation": "auto",
|
||||
"reduceOptions": {
|
||||
"values": false,
|
||||
"calcs": [
|
||||
"sum"
|
||||
],
|
||||
"fields": ""
|
||||
},
|
||||
"textMode": "auto"
|
||||
},
|
||||
"pluginVersion": "9.5.3",
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "prometheus"
|
||||
},
|
||||
"expr": "sum(increase(http_requests_total{job=\"dagi-stt\"}[1h]))",
|
||||
"legendFormat": "STT Requests (1h)",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"title": "Voice Messages (1h)",
|
||||
"type": "stat"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "prometheus"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "short"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 8,
|
||||
"x": 8,
|
||||
"y": 14
|
||||
},
|
||||
"id": 6,
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "auto",
|
||||
"orientation": "auto",
|
||||
"reduceOptions": {
|
||||
"values": false,
|
||||
"calcs": [
|
||||
"sum"
|
||||
],
|
||||
"fields": ""
|
||||
},
|
||||
"textMode": "auto"
|
||||
},
|
||||
"pluginVersion": "9.5.3",
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "prometheus"
|
||||
},
|
||||
"expr": "sum(increase(http_requests_total{job=\"dagi-tts\"}[1h]))",
|
||||
"legendFormat": "TTS Requests (1h)",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"title": "Voice Responses (1h)",
|
||||
"type": "stat"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "prometheus"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "short"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 8,
|
||||
"x": 16,
|
||||
"y": 14
|
||||
},
|
||||
"id": 7,
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "auto",
|
||||
"orientation": "auto",
|
||||
"reduceOptions": {
|
||||
"values": false,
|
||||
"calcs": [
|
||||
"sum"
|
||||
],
|
||||
"fields": ""
|
||||
},
|
||||
"textMode": "auto"
|
||||
},
|
||||
"pluginVersion": "9.5.3",
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "prometheus"
|
||||
},
|
||||
"expr": "sum(increase(http_requests_total{job=\"dagi-parser\"}[1h]))",
|
||||
"legendFormat": "Parser Requests (1h)",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"title": "Documents Processed (1h)",
|
||||
"type": "stat"
|
||||
}
|
||||
],
|
||||
"refresh": "5s",
|
||||
"schemaVersion": 38,
|
||||
"style": "dark",
|
||||
"tags": [
|
||||
"telegram",
|
||||
"bots"
|
||||
],
|
||||
"templating": {
|
||||
"list": []
|
||||
},
|
||||
"time": {
|
||||
"from": "now-1h",
|
||||
"to": "now"
|
||||
},
|
||||
"timepicker": {},
|
||||
"timezone": "",
|
||||
"title": "Telegram Bots Monitoring",
|
||||
"uid": "telegram-bots",
|
||||
"version": 0,
|
||||
"weekStart": ""
|
||||
}
|
||||
|
||||
13
monitoring/grafana/datasources/prometheus.yml
Normal file
13
monitoring/grafana/datasources/prometheus.yml
Normal file
@@ -0,0 +1,13 @@
|
||||
apiVersion: 1
|
||||
|
||||
datasources:
|
||||
- name: Prometheus
|
||||
type: prometheus
|
||||
access: proxy
|
||||
url: http://prometheus:9090
|
||||
isDefault: true
|
||||
editable: true
|
||||
jsonData:
|
||||
timeInterval: "15s"
|
||||
queryTimeout: "60s"
|
||||
|
||||
129
monitoring/prometheus/alerts/daarion_alerts.yml
Normal file
129
monitoring/prometheus/alerts/daarion_alerts.yml
Normal file
@@ -0,0 +1,129 @@
|
||||
groups:
|
||||
- name: DAARION Platform
|
||||
interval: 30s
|
||||
rules:
|
||||
# Service Health Alerts
|
||||
- alert: ServiceDown
|
||||
expr: up == 0
|
||||
for: 2m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "Service {{ $labels.job }} is down"
|
||||
description: "{{ $labels.job }} has been down for more than 2 minutes"
|
||||
|
||||
- alert: HighErrorRate
|
||||
expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "High error rate on {{ $labels.job }}"
|
||||
description: "Error rate is {{ $value }} errors/sec"
|
||||
|
||||
# Router Alerts
|
||||
- alert: RouterHighLatency
|
||||
expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket{job="dagi-router"}[5m])) > 10
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "DAGI Router high latency"
|
||||
description: "95th percentile latency is {{ $value }}s"
|
||||
|
||||
- alert: RouterHighLoad
|
||||
expr: rate(http_requests_total{job="dagi-router"}[1m]) > 100
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "DAGI Router high load"
|
||||
description: "Request rate is {{ $value }} req/sec"
|
||||
|
||||
# Telegram Gateway Alerts
|
||||
- alert: TelegramGatewayDown
|
||||
expr: up{job="telegram-gateway"} == 0
|
||||
for: 1m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "Telegram Gateway is down"
|
||||
description: "Telegram bots will not respond"
|
||||
|
||||
- alert: TelegramMessageBacklog
|
||||
expr: telegram_message_queue_size > 100
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Telegram message backlog"
|
||||
description: "{{ $value }} messages in queue"
|
||||
|
||||
# LLM Performance
|
||||
- alert: LLMHighLatency
|
||||
expr: histogram_quantile(0.95, rate(llm_request_duration_seconds_bucket[5m])) > 30
|
||||
for: 10m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "LLM high latency"
|
||||
description: "95th percentile LLM latency is {{ $value }}s"
|
||||
|
||||
- alert: LLMErrorRate
|
||||
expr: rate(llm_errors_total[5m]) > 0.1
|
||||
for: 5m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "High LLM error rate"
|
||||
description: "LLM error rate is {{ $value }} errors/sec"
|
||||
|
||||
# Database Alerts
|
||||
- alert: PostgreSQLDown
|
||||
expr: up{job="postgres"} == 0
|
||||
for: 1m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "PostgreSQL is down"
|
||||
description: "Database is unavailable"
|
||||
|
||||
# NATS Alerts
|
||||
- alert: NATSDown
|
||||
expr: up{job="nats"} == 0
|
||||
for: 1m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "NATS is down"
|
||||
description: "Message broker is unavailable"
|
||||
|
||||
# Vector DB Alerts
|
||||
- alert: QdrantHighMemory
|
||||
expr: qdrant_memory_used_bytes / qdrant_memory_total_bytes > 0.9
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Qdrant high memory usage"
|
||||
description: "Memory usage is {{ $value | humanizePercentage }}"
|
||||
|
||||
# Disk Space Alerts
|
||||
- alert: DiskSpaceWarning
|
||||
expr: (node_filesystem_avail_bytes / node_filesystem_size_bytes) < 0.2
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Low disk space"
|
||||
description: "Only {{ $value | humanizePercentage }} disk space left"
|
||||
|
||||
- alert: DiskSpaceCritical
|
||||
expr: (node_filesystem_avail_bytes / node_filesystem_size_bytes) < 0.1
|
||||
for: 2m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "Critical disk space"
|
||||
description: "Only {{ $value | humanizePercentage }} disk space left"
|
||||
|
||||
124
monitoring/prometheus/prometheus.yml
Normal file
124
monitoring/prometheus/prometheus.yml
Normal file
@@ -0,0 +1,124 @@
|
||||
# Prometheus Configuration for DAARION Platform
|
||||
|
||||
global:
|
||||
scrape_interval: 15s
|
||||
evaluation_interval: 15s
|
||||
external_labels:
|
||||
cluster: 'daarion-prod'
|
||||
environment: 'production'
|
||||
|
||||
# Alertmanager configuration
|
||||
alerting:
|
||||
alertmanagers:
|
||||
- static_configs:
|
||||
- targets: []
|
||||
# - alertmanager:9093
|
||||
|
||||
# Load rules once and periodically evaluate them
|
||||
rule_files:
|
||||
- "/etc/prometheus/alerts/*.yml"
|
||||
|
||||
# Scrape configurations
|
||||
scrape_configs:
|
||||
# DAGI Router
|
||||
- job_name: 'dagi-router'
|
||||
static_configs:
|
||||
- targets: ['dagi-router:9102']
|
||||
metrics_path: '/metrics'
|
||||
scrape_interval: 10s
|
||||
|
||||
# Telegram Gateway
|
||||
- job_name: 'telegram-gateway'
|
||||
static_configs:
|
||||
- targets: ['telegram-gateway:8000']
|
||||
metrics_path: '/metrics'
|
||||
scrape_interval: 10s
|
||||
|
||||
# DAGI Gateway
|
||||
- job_name: 'dagi-gateway'
|
||||
static_configs:
|
||||
- targets: ['dagi-gateway:9300']
|
||||
metrics_path: '/metrics'
|
||||
scrape_interval: 10s
|
||||
|
||||
# RBAC Service
|
||||
- job_name: 'dagi-rbac'
|
||||
static_configs:
|
||||
- targets: ['dagi-rbac:9200']
|
||||
metrics_path: '/metrics'
|
||||
scrape_interval: 15s
|
||||
|
||||
# CrewAI Service
|
||||
- job_name: 'dagi-crewai'
|
||||
static_configs:
|
||||
- targets: ['dagi-crewai:9010']
|
||||
metrics_path: '/metrics'
|
||||
scrape_interval: 15s
|
||||
|
||||
# Parser Service
|
||||
- job_name: 'dagi-parser'
|
||||
static_configs:
|
||||
- targets: ['dagi-parser:9400']
|
||||
metrics_path: '/metrics'
|
||||
scrape_interval: 20s
|
||||
|
||||
# Vision Encoder
|
||||
- job_name: 'dagi-vision-encoder'
|
||||
static_configs:
|
||||
- targets: ['dagi-vision-encoder:8001']
|
||||
metrics_path: '/metrics'
|
||||
scrape_interval: 20s
|
||||
|
||||
# DevTools
|
||||
- job_name: 'dagi-devtools'
|
||||
static_configs:
|
||||
- targets: ['dagi-devtools:8008']
|
||||
metrics_path: '/metrics'
|
||||
scrape_interval: 15s
|
||||
|
||||
# STT Service
|
||||
- job_name: 'dagi-stt'
|
||||
static_configs:
|
||||
- targets: ['dagi-stt:9000']
|
||||
metrics_path: '/metrics'
|
||||
scrape_interval: 20s
|
||||
|
||||
# TTS Service
|
||||
- job_name: 'dagi-tts'
|
||||
static_configs:
|
||||
- targets: ['dagi-tts:9101']
|
||||
metrics_path: '/metrics'
|
||||
scrape_interval: 20s
|
||||
|
||||
# Qdrant Vector DB
|
||||
- job_name: 'dagi-qdrant'
|
||||
static_configs:
|
||||
- targets: ['dagi-qdrant:6333']
|
||||
metrics_path: '/metrics'
|
||||
scrape_interval: 30s
|
||||
|
||||
# NATS
|
||||
- job_name: 'nats'
|
||||
static_configs:
|
||||
- targets: ['nats:8222']
|
||||
metrics_path: '/varz'
|
||||
scrape_interval: 15s
|
||||
|
||||
# PostgreSQL (if exporter is installed)
|
||||
- job_name: 'postgres'
|
||||
static_configs:
|
||||
- targets: ['dagi-postgres:5432']
|
||||
metrics_path: '/metrics'
|
||||
scrape_interval: 30s
|
||||
|
||||
# Prometheus self-monitoring
|
||||
- job_name: 'prometheus'
|
||||
static_configs:
|
||||
- targets: ['localhost:9090']
|
||||
|
||||
# Docker containers (if node_exporter is installed)
|
||||
- job_name: 'node-exporter'
|
||||
static_configs:
|
||||
- targets: ['host.docker.internal:9100']
|
||||
scrape_interval: 30s
|
||||
|
||||
Reference in New Issue
Block a user