docs: add node1 runbooks, consolidation artifacts, and maintenance scripts
This commit is contained in:
46
docs/architecture_inventory/06_OBSERVABILITY_AND_BACKUPS.md
Normal file
46
docs/architecture_inventory/06_OBSERVABILITY_AND_BACKUPS.md
Normal file
@@ -0,0 +1,46 @@
|
||||
# Observability and Backups
|
||||
|
||||
## Observability Stack
|
||||
- Prometheus config: `monitoring/prometheus/prometheus.yml`.
|
||||
- Scrapes: prometheus self, `agent-e2e-prober`, `gateway`, `router`, `qdrant`, `grafana`.
|
||||
- Alert rules: `monitoring/prometheus/rules/node1.rules.yml`.
|
||||
- Grafana provisioning and dashboards:
|
||||
- datasources: `monitoring/grafana/provisioning/datasources/prometheus.yml`
|
||||
- dashboards: `monitoring/grafana/dashboards/*.json`
|
||||
- alerting: `monitoring/grafana/provisioning/alerting/alerts.yml`
|
||||
- Loki/OTel/Tempo/Jaeger: no active compose evidence in this repo’s current manifests.
|
||||
|
||||
## Service-Level Telemetry
|
||||
- Router exposes `/metrics` (`services/router/main.py`).
|
||||
- Gateway exposes metrics endpoint (compose monitors `/metrics`).
|
||||
- SenpAI consumer has Prometheus metrics in code (`senpai_nats_connected`, reconnect counters).
|
||||
- Prober exports metrics on `9108`.
|
||||
|
||||
## Backup and DR
|
||||
### Data backups
|
||||
- Scheduled Postgres backup container: `docker-compose.backups.yml` (`SCHEDULE: @every 6h`, keep days/weeks/months).
|
||||
- Full backup script: `scripts/backup/backup_all.sh` (Postgres dump + Qdrant snapshots + Neo4j dump + metadata file).
|
||||
- Restore validation script: `scripts/restore/restore_test.sh`.
|
||||
|
||||
### Documentation backups
|
||||
- `scripts/docs/docs_backup.sh` creates timestamped archives and retention rotation.
|
||||
- `scripts/docs/install_local_cron.sh` installs local managed cron block for docs maintenance.
|
||||
|
||||
## DR Readiness Notes
|
||||
- Backup script metadata and restore script provide reproducible path checks.
|
||||
- Compose-based backup path uses host bind `/opt/backups/postgres:/backups` (host-level storage requirement).
|
||||
- Runbooks report prior backup-image version mismatch issue; currently compose pins backup image `:16`.
|
||||
|
||||
## Source pointers
|
||||
- `monitoring/prometheus/prometheus.yml`
|
||||
- `monitoring/prometheus/rules/node1.rules.yml`
|
||||
- `monitoring/grafana/provisioning/datasources/prometheus.yml`
|
||||
- `monitoring/grafana/provisioning/alerting/alerts.yml`
|
||||
- `monitoring/grafana/dashboards/nats_memory.json`
|
||||
- `docker-compose.backups.yml`
|
||||
- `scripts/backup/backup_all.sh`
|
||||
- `scripts/restore/restore_test.sh`
|
||||
- `scripts/docs/docs_backup.sh`
|
||||
- `scripts/docs/install_local_cron.sh`
|
||||
- `docs/NODA1-MEMORY-RUNBOOK.md`
|
||||
- `docs/NODA1-TECHBORGS-PATCHES.md`
|
||||
Reference in New Issue
Block a user