Config policies (16 files): alert_routing, architecture_pressure, backlog, cost_weights, data_governance, incident_escalation, incident_intelligence, network_allowlist, nodes_registry, observability_sources, rbac_tools_matrix, release_gate, risk_attribution, risk_policy, slo_policy, tool_limits, tools_rollout Ops (22 files): Caddyfile, calendar compose, grafana voice dashboard, deployments/incidents logs, runbooks for alerts/audit/backlog/incidents/sofiia/voice, cron jobs, scripts (alert_triage, audit_cleanup, migrate_*, governance, schedule), task_registry, voice alerts/ha/latency/policy Docs (30+ files): HUMANIZED_STEPAN v2.7-v3 changelogs and runbooks, NODA1/NODA2 status and setup, audit index and traces, backlog, incident, supervisor, tools, voice, opencode, release, risk, aistalk, spacebot Made-with: Cursor
5.4 KiB
Runbook: Postgres Audit Backend
Overview
The audit backend stores structured, non-payload ToolGovernance events for FinOps, privacy analysis, and incident triage.
| Backend | Config | Use case |
|---|---|---|
auto |
AUDIT_BACKEND=auto + DATABASE_URL=... |
Recommended for prod/staging: tries Postgres, falls back to JSONL on failure |
postgres |
AUDIT_BACKEND=postgres |
Hard-require Postgres; fails on DB down |
jsonl |
AUDIT_BACKEND=jsonl |
JSONL files only (default / dev) |
null |
AUDIT_BACKEND=null |
Discard all events (useful for testing) |
1. Initial Setup (NODE1 / Gateway)
1.1 Create tool_audit_events table (idempotent)
DATABASE_URL="postgresql://user:password@host:5432/daarion" \
python3 ops/scripts/migrate_audit_postgres.py
Dry-run (print DDL only):
python3 ops/scripts/migrate_audit_postgres.py --dry-run
1.2 Configure environment
In services/router/.env (or your Docker env):
AUDIT_BACKEND=auto
DATABASE_URL=postgresql://audit_user:secret@pg-host:5432/daarion
AUDIT_JSONL_DIR=/var/log/daarion/audit # fallback dir
Restart the router after changes.
1.3 Verify
# Check router logs for:
# AuditStore: auto (postgres→jsonl fallback) dsn=postgresql://...
docker logs router 2>&1 | grep AuditStore
# Or call the dashboard:
curl http://localhost:8080/v1/finops/dashboard?window_hours=24 \
-H "X-Agent-Id: sofiia"
2. AUDIT_BACKEND=auto Fallback Behaviour
When AUDIT_BACKEND=auto:
- Normal operation: all writes/reads go to Postgres.
- Postgres failure:
AutoAuditStorecatches the error, logs a WARNING, and switches to JSONL for the next ~5 minutes. - Recovery: after 5 minutes the next write attempt re-tries Postgres. If successful, switches back silently.
This means tool calls are never blocked by a DB outage; events continue to land in JSONL.
3. Schema
CREATE TABLE IF NOT EXISTS tool_audit_events (
id BIGSERIAL PRIMARY KEY,
ts TIMESTAMPTZ NOT NULL,
req_id TEXT NOT NULL,
workspace_id TEXT NOT NULL,
user_id TEXT NOT NULL,
agent_id TEXT NOT NULL,
tool TEXT NOT NULL,
action TEXT NOT NULL,
status TEXT NOT NULL,
duration_ms INT NOT NULL DEFAULT 0,
in_size INT NOT NULL DEFAULT 0,
out_size INT NOT NULL DEFAULT 0,
input_hash TEXT NOT NULL DEFAULT '',
graph_run_id TEXT,
graph_node TEXT,
job_id TEXT
);
Indexes: ts, (workspace_id, ts), (tool, ts), (agent_id, ts).
4. Scheduled Operational Jobs
Jobs are run via ops/scripts/schedule_jobs.py (called by cron — see ops/cron/jobs.cron):
| Job | Schedule | What it does |
|---|---|---|
audit_cleanup |
Daily 03:30 | Deletes/gzips JSONL files older than 30 days |
daily_cost_digest |
Daily 09:00 | Cost digest → ops/reports/cost/YYYY-MM-DD.{json,md} |
daily_privacy_digest |
Daily 09:10 | Privacy digest → ops/reports/privacy/YYYY-MM-DD.{json,md} |
weekly_drift_full |
Mon 02:00 | Full drift → ops/reports/drift/week-YYYY-WW.json |
Run manually
# Cost digest
AUDIT_BACKEND=auto DATABASE_URL=... \
python3 ops/scripts/schedule_jobs.py daily_cost_digest
# Privacy digest
python3 ops/scripts/schedule_jobs.py daily_privacy_digest
# Weekly drift
python3 ops/scripts/schedule_jobs.py weekly_drift_full
5. Dashboard Endpoints
| Endpoint | RBAC | Description |
|---|---|---|
GET /v1/finops/dashboard?window_hours=24 |
tools.cost.read |
FinOps cost digest |
GET /v1/privacy/dashboard?window_hours=24 |
tools.data_gov.read |
Privacy/audit digest |
Headers:
X-Agent-Id: sofiia(or any agent with appropriate entitlements)X-Workspace-Id: your-ws
6. Maintenance & Troubleshooting
Check active backend at runtime
curl -s http://localhost:8080/v1/finops/dashboard \
-H "X-Agent-Id: sofiia" | python3 -m json.tool | grep source_backend
Force Postgres migration (re-apply schema)
python3 ops/scripts/migrate_audit_postgres.py
Postgres is down — expected behaviour
- Router logs:
WARNING: AutoAuditStore: Postgres write failed (...), switching to JSONL fallback - Events land in
AUDIT_JSONL_DIR/tool_audit_YYYY-MM-DD.jsonl - Recovery automatic after 5 minutes
- No tool call failures
JSONL fallback getting large
Run compaction:
python3 ops/scripts/audit_compact.py \
--audit-dir ops/audit --window-days 7 --output ops/audit/compact
Then cleanup old originals:
python3 ops/scripts/audit_cleanup.py \
--audit-dir ops/audit --retention-days 30
Retention enforcement
Enforced by daily audit_cleanup job (cron 03:30). Policy defined in config/data_governance_policy.yml:
retention:
audit_jsonl_days: 30
audit_postgres_days: 90
Postgres retention (if needed) must be managed separately with a DELETE FROM tool_audit_events WHERE ts < NOW() - INTERVAL '90 days' job or pg_partman.
7. Security Notes
- No PII or payload is stored in
tool_audit_events— only sizes, hashes, and metadata. DATABASE_URLmust be a restricted user withINSERT/SELECTontool_audit_eventsonly.- JSONL fallback files inherit filesystem permissions; ensure directory is
chmod 700.