Files
microdao-daarion/docs/release/sofiia-console-v1-readiness.md
Apple 67225a39fa docs(platform): add policy configs, runbooks, ops scripts and platform documentation
Config policies (16 files): alert_routing, architecture_pressure, backlog,
cost_weights, data_governance, incident_escalation, incident_intelligence,
network_allowlist, nodes_registry, observability_sources, rbac_tools_matrix,
release_gate, risk_attribution, risk_policy, slo_policy, tool_limits, tools_rollout

Ops (22 files): Caddyfile, calendar compose, grafana voice dashboard,
deployments/incidents logs, runbooks for alerts/audit/backlog/incidents/sofiia/voice,
cron jobs, scripts (alert_triage, audit_cleanup, migrate_*, governance, schedule),
task_registry, voice alerts/ha/latency/policy

Docs (30+ files): HUMANIZED_STEPAN v2.7-v3 changelogs and runbooks,
NODA1/NODA2 status and setup, audit index and traces, backlog, incident,
supervisor, tools, voice, opencode, release, risk, aistalk, spacebot

Made-with: Cursor
2026-03-03 07:14:53 -08:00

110 lines
3.4 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Sofiia Console v1.0 Release Readiness Summary
One-page go/no-go артефакт для релізного рішення по `sofiia-console`.
## 1) Scope & Version
- Service: `sofiia-console`
- Target version / tag: `v1.0` (to be assigned at release cut)
- Git SHAs:
- sofiia-console: `e75fd33`
- router: `<set at release window>`
- gateway: `<set at release window>`
- Deployment target:
- NODA1: production runtime/data plane
- NODA2: control plane / sofiia-console
- Date prepared: `<set at release window>`
- Prepared by: `<operator>`
## 2) Production Guarantees
### Reliability
- Idempotent `POST /api/chats/{chat_id}/send` with selectable backend (`inmemory|redis`).
- Multi-node routing covered by E2E tests (NODA1/NODA2 via `infer` monkeypatch path).
- Cursor pagination hardened with tie-breakers (`(ts,id)` / stable ordering semantics).
- Release process formalized via preflight + release runbook + smoke scripts.
### Security
- Rate limiting on send path:
- per-chat scope
- per-operator scope
- Strict `/api/audit` protection:
- key required
- no localhost bypass
- Structured audit trail:
- write events for operator actions
- cursor-based read endpoint
- Secrets rotation runbook documented and operational.
### Operational Controls
- `/metrics` exposed (including rate-limit and idempotency counters).
- Structured JSON logs for send/replay/pagination/error flows.
- Audit retention policy in place (default 90 days).
- Pruning script available (`ops/prune_audit_db.py`: dry-run + batch delete + optional vacuum).
- Release evidence auto-generator available (`ops/generate_release_evidence.sh`).
## 3) Known Limitations / Residual Risks
- Chat index is still local DB-backed; full multi-instance HA for global chat index needs Phase 6 (Redis ChatIndexStore).
- Rate-limit defaults to `inmemory`; multi-instance consistency needs `SOFIIA_RATE_LIMIT_BACKEND=redis`.
- Audit storage is SQLite (single-node storage, non-clustered by default).
- Automatic alerting/paging is not yet enabled; metric observation is primarily manual/runbook-driven.
## 4) Required Release-Day Checks
### Preflight
- `STRICT=1 bash ops/preflight_sofiia_console.sh`
### Deploy order
- NODA2 precheck
- NODA1 rollout
- NODA2 finalize
### Smoke
- `GET /api/health` -> `200`
- `/metrics` reachable
- `bash ops/redis_idempotency_smoke.sh` -> `PASS` (when redis backend is enabled)
- `/api/audit` auth:
- without key -> `401`
- with key -> `200`
### Post-release
- Verify rate-limit metrics increment under controlled load.
- Verify audit write/read quick check.
- Run retention dry-run:
- `python3 ops/prune_audit_db.py --dry-run`
## 5) Explicit Go / No-Go Criteria
**GO if all conditions hold:**
- Preflight is `PASS` (or only non-critical `WARN` accepted by operator).
- Smoke checks pass.
- No unexpected 5xx spike during first 510 minutes.
- Rate-limit counters and idempotency behavior are within expected range.
**NO-GO if any condition holds:**
- Strict audit auth fails (401/200 behavior broken).
- Redis idempotency A/B smoke fails.
- Audit write/read fails.
- Unexpected 500s on send path.
## 6) Rollback Readiness Statement
- Rollback method:
- revert to previous known-good SHA/tag
- restart affected services via docker compose/systemd as per runbook
- Estimated rollback time: `<set by operator, typically 5-15 min>`
- Mandatory post-rollback smoke:
- `/api/health`
- idempotency smoke
- audit auth/read checks