Files
microdao-daarion/docs/architecture_inventory/06_OBSERVABILITY_AND_BACKUPS.md

2.2 KiB
Raw Permalink Blame History

Observability and Backups

Observability Stack

  • Prometheus config: monitoring/prometheus/prometheus.yml.
    • Scrapes: prometheus self, agent-e2e-prober, gateway, router, qdrant, grafana.
  • Alert rules: monitoring/prometheus/rules/node1.rules.yml.
  • Grafana provisioning and dashboards:
    • datasources: monitoring/grafana/provisioning/datasources/prometheus.yml
    • dashboards: monitoring/grafana/dashboards/*.json
    • alerting: monitoring/grafana/provisioning/alerting/alerts.yml
  • Loki/OTel/Tempo/Jaeger: no active compose evidence in this repos current manifests.

Service-Level Telemetry

  • Router exposes /metrics (services/router/main.py).
  • Gateway exposes metrics endpoint (compose monitors /metrics).
  • SenpAI consumer has Prometheus metrics in code (senpai_nats_connected, reconnect counters).
  • Prober exports metrics on 9108.

Backup and DR

Data backups

  • Scheduled Postgres backup container: docker-compose.backups.yml (SCHEDULE: @every 6h, keep days/weeks/months).
  • Full backup script: scripts/backup/backup_all.sh (Postgres dump + Qdrant snapshots + Neo4j dump + metadata file).
  • Restore validation script: scripts/restore/restore_test.sh.

Documentation backups

  • scripts/docs/docs_backup.sh creates timestamped archives and retention rotation.
  • scripts/docs/install_local_cron.sh installs local managed cron block for docs maintenance.

DR Readiness Notes

  • Backup script metadata and restore script provide reproducible path checks.
  • Compose-based backup path uses host bind /opt/backups/postgres:/backups (host-level storage requirement).
  • Runbooks report prior backup-image version mismatch issue; currently compose pins backup image :16.

Source pointers

  • monitoring/prometheus/prometheus.yml
  • monitoring/prometheus/rules/node1.rules.yml
  • monitoring/grafana/provisioning/datasources/prometheus.yml
  • monitoring/grafana/provisioning/alerting/alerts.yml
  • monitoring/grafana/dashboards/nats_memory.json
  • docker-compose.backups.yml
  • scripts/backup/backup_all.sh
  • scripts/restore/restore_test.sh
  • scripts/docs/docs_backup.sh
  • scripts/docs/install_local_cron.sh
  • docs/NODA1-MEMORY-RUNBOOK.md
  • docs/NODA1-TECHBORGS-PATCHES.md