# Runbook: Postgres Audit Backend ## Overview The audit backend stores structured, non-payload `ToolGovernance` events for FinOps, privacy analysis, and incident triage. | Backend | Config | Use case | |---------|--------|----------| | `auto` | `AUDIT_BACKEND=auto` + `DATABASE_URL=...` | **Recommended for prod/staging**: tries Postgres, falls back to JSONL on failure | | `postgres` | `AUDIT_BACKEND=postgres` | Hard-require Postgres; fails on DB down | | `jsonl` | `AUDIT_BACKEND=jsonl` | JSONL files only (default / dev) | | `null` | `AUDIT_BACKEND=null` | Discard all events (useful for testing) | --- ## 1. Initial Setup (NODE1 / Gateway) ### 1.1 Create `tool_audit_events` table (idempotent) ```bash DATABASE_URL="postgresql://user:password@host:5432/daarion" \ python3 ops/scripts/migrate_audit_postgres.py ``` Dry-run (print DDL only): ```bash python3 ops/scripts/migrate_audit_postgres.py --dry-run ``` ### 1.2 Configure environment In `services/router/.env` (or your Docker env): ```env AUDIT_BACKEND=auto DATABASE_URL=postgresql://audit_user:secret@pg-host:5432/daarion AUDIT_JSONL_DIR=/var/log/daarion/audit # fallback dir ``` Restart the router after changes. ### 1.3 Verify ```bash # Check router logs for: # AuditStore: auto (postgres→jsonl fallback) dsn=postgresql://... docker logs router 2>&1 | grep AuditStore # Or call the dashboard: curl http://localhost:8080/v1/finops/dashboard?window_hours=24 \ -H "X-Agent-Id: sofiia" ``` --- ## 2. `AUDIT_BACKEND=auto` Fallback Behaviour When `AUDIT_BACKEND=auto`: 1. **Normal operation**: all writes/reads go to Postgres. 2. **Postgres failure**: `AutoAuditStore` catches the error, logs a WARNING, and switches to JSONL for the next ~5 minutes. 3. **Recovery**: after 5 minutes the next write attempt re-tries Postgres. If successful, switches back silently. This means **tool calls are never blocked** by a DB outage; events continue to land in JSONL. --- ## 3. Schema ```sql CREATE TABLE IF NOT EXISTS tool_audit_events ( id BIGSERIAL PRIMARY KEY, ts TIMESTAMPTZ NOT NULL, req_id TEXT NOT NULL, workspace_id TEXT NOT NULL, user_id TEXT NOT NULL, agent_id TEXT NOT NULL, tool TEXT NOT NULL, action TEXT NOT NULL, status TEXT NOT NULL, duration_ms INT NOT NULL DEFAULT 0, in_size INT NOT NULL DEFAULT 0, out_size INT NOT NULL DEFAULT 0, input_hash TEXT NOT NULL DEFAULT '', graph_run_id TEXT, graph_node TEXT, job_id TEXT ); ``` Indexes: `ts`, `(workspace_id, ts)`, `(tool, ts)`, `(agent_id, ts)`. --- ## 4. Scheduled Operational Jobs Jobs are run via `ops/scripts/schedule_jobs.py` (called by cron — see `ops/cron/jobs.cron`): | Job | Schedule | What it does | |-----|----------|--------------| | `audit_cleanup` | Daily 03:30 | Deletes/gzips JSONL files older than 30 days | | `daily_cost_digest` | Daily 09:00 | Cost digest → `ops/reports/cost/YYYY-MM-DD.{json,md}` | | `daily_privacy_digest` | Daily 09:10 | Privacy digest → `ops/reports/privacy/YYYY-MM-DD.{json,md}` | | `weekly_drift_full` | Mon 02:00 | Full drift → `ops/reports/drift/week-YYYY-WW.json` | ### Run manually ```bash # Cost digest AUDIT_BACKEND=auto DATABASE_URL=... \ python3 ops/scripts/schedule_jobs.py daily_cost_digest # Privacy digest python3 ops/scripts/schedule_jobs.py daily_privacy_digest # Weekly drift python3 ops/scripts/schedule_jobs.py weekly_drift_full ``` --- ## 5. Dashboard Endpoints | Endpoint | RBAC | Description | |----------|------|-------------| | `GET /v1/finops/dashboard?window_hours=24` | `tools.cost.read` | FinOps cost digest | | `GET /v1/privacy/dashboard?window_hours=24` | `tools.data_gov.read` | Privacy/audit digest | Headers: - `X-Agent-Id: sofiia` (or any agent with appropriate entitlements) - `X-Workspace-Id: your-ws` --- ## 6. Maintenance & Troubleshooting ### Check active backend at runtime ```bash curl -s http://localhost:8080/v1/finops/dashboard \ -H "X-Agent-Id: sofiia" | python3 -m json.tool | grep source_backend ``` ### Force Postgres migration (re-apply schema) ```bash python3 ops/scripts/migrate_audit_postgres.py ``` ### Postgres is down — expected behaviour - Router logs: `WARNING: AutoAuditStore: Postgres write failed (...), switching to JSONL fallback` - Events land in `AUDIT_JSONL_DIR/tool_audit_YYYY-MM-DD.jsonl` - Recovery automatic after 5 minutes - No tool call failures ### JSONL fallback getting large Run compaction: ```bash python3 ops/scripts/audit_compact.py \ --audit-dir ops/audit --window-days 7 --output ops/audit/compact ``` Then cleanup old originals: ```bash python3 ops/scripts/audit_cleanup.py \ --audit-dir ops/audit --retention-days 30 ``` ### Retention enforcement Enforced by daily `audit_cleanup` job (cron 03:30). Policy defined in `config/data_governance_policy.yml`: ```yaml retention: audit_jsonl_days: 30 audit_postgres_days: 90 ``` Postgres retention (if needed) must be managed separately with a `DELETE FROM tool_audit_events WHERE ts < NOW() - INTERVAL '90 days'` job or pg_partman. --- ## 7. Security Notes - No PII or payload is stored in `tool_audit_events` — only sizes, hashes, and metadata. - `DATABASE_URL` must be a restricted user with `INSERT/SELECT` on `tool_audit_events` only. - JSONL fallback files inherit filesystem permissions; ensure directory is `chmod 700`.