2.2 KiB
2.2 KiB
Observability and Backups
Observability Stack
- Prometheus config:
monitoring/prometheus/prometheus.yml.- Scrapes: prometheus self,
agent-e2e-prober,gateway,router,qdrant,grafana.
- Scrapes: prometheus self,
- Alert rules:
monitoring/prometheus/rules/node1.rules.yml. - Grafana provisioning and dashboards:
- datasources:
monitoring/grafana/provisioning/datasources/prometheus.yml - dashboards:
monitoring/grafana/dashboards/*.json - alerting:
monitoring/grafana/provisioning/alerting/alerts.yml
- datasources:
- Loki/OTel/Tempo/Jaeger: no active compose evidence in this repo’s current manifests.
Service-Level Telemetry
- Router exposes
/metrics(services/router/main.py). - Gateway exposes metrics endpoint (compose monitors
/metrics). - SenpAI consumer has Prometheus metrics in code (
senpai_nats_connected, reconnect counters). - Prober exports metrics on
9108.
Backup and DR
Data backups
- Scheduled Postgres backup container:
docker-compose.backups.yml(SCHEDULE: @every 6h, keep days/weeks/months). - Full backup script:
scripts/backup/backup_all.sh(Postgres dump + Qdrant snapshots + Neo4j dump + metadata file). - Restore validation script:
scripts/restore/restore_test.sh.
Documentation backups
scripts/docs/docs_backup.shcreates timestamped archives and retention rotation.scripts/docs/install_local_cron.shinstalls local managed cron block for docs maintenance.
DR Readiness Notes
- Backup script metadata and restore script provide reproducible path checks.
- Compose-based backup path uses host bind
/opt/backups/postgres:/backups(host-level storage requirement). - Runbooks report prior backup-image version mismatch issue; currently compose pins backup image
:16.
Source pointers
monitoring/prometheus/prometheus.ymlmonitoring/prometheus/rules/node1.rules.ymlmonitoring/grafana/provisioning/datasources/prometheus.ymlmonitoring/grafana/provisioning/alerting/alerts.ymlmonitoring/grafana/dashboards/nats_memory.jsondocker-compose.backups.ymlscripts/backup/backup_all.shscripts/restore/restore_test.shscripts/docs/docs_backup.shscripts/docs/install_local_cron.shdocs/NODA1-MEMORY-RUNBOOK.mddocs/NODA1-TECHBORGS-PATCHES.md