# Observability and Backups ## Observability Stack - Prometheus config: `monitoring/prometheus/prometheus.yml`. - Scrapes: prometheus self, `agent-e2e-prober`, `gateway`, `router`, `qdrant`, `grafana`. - Alert rules: `monitoring/prometheus/rules/node1.rules.yml`. - Grafana provisioning and dashboards: - datasources: `monitoring/grafana/provisioning/datasources/prometheus.yml` - dashboards: `monitoring/grafana/dashboards/*.json` - alerting: `monitoring/grafana/provisioning/alerting/alerts.yml` - Loki/OTel/Tempo/Jaeger: no active compose evidence in this repo’s current manifests. ## Service-Level Telemetry - Router exposes `/metrics` (`services/router/main.py`). - Gateway exposes metrics endpoint (compose monitors `/metrics`). - SenpAI consumer has Prometheus metrics in code (`senpai_nats_connected`, reconnect counters). - Prober exports metrics on `9108`. ## Backup and DR ### Data backups - Scheduled Postgres backup container: `docker-compose.backups.yml` (`SCHEDULE: @every 6h`, keep days/weeks/months). - Full backup script: `scripts/backup/backup_all.sh` (Postgres dump + Qdrant snapshots + Neo4j dump + metadata file). - Restore validation script: `scripts/restore/restore_test.sh`. ### Documentation backups - `scripts/docs/docs_backup.sh` creates timestamped archives and retention rotation. - `scripts/docs/install_local_cron.sh` installs local managed cron block for docs maintenance. ## DR Readiness Notes - Backup script metadata and restore script provide reproducible path checks. - Compose-based backup path uses host bind `/opt/backups/postgres:/backups` (host-level storage requirement). - Runbooks report prior backup-image version mismatch issue; currently compose pins backup image `:16`. ## Source pointers - `monitoring/prometheus/prometheus.yml` - `monitoring/prometheus/rules/node1.rules.yml` - `monitoring/grafana/provisioning/datasources/prometheus.yml` - `monitoring/grafana/provisioning/alerting/alerts.yml` - `monitoring/grafana/dashboards/nats_memory.json` - `docker-compose.backups.yml` - `scripts/backup/backup_all.sh` - `scripts/restore/restore_test.sh` - `scripts/docs/docs_backup.sh` - `scripts/docs/install_local_cron.sh` - `docs/NODA1-MEMORY-RUNBOOK.md` - `docs/NODA1-TECHBORGS-PATCHES.md`