Config policies (16 files): alert_routing, architecture_pressure, backlog, cost_weights, data_governance, incident_escalation, incident_intelligence, network_allowlist, nodes_registry, observability_sources, rbac_tools_matrix, release_gate, risk_attribution, risk_policy, slo_policy, tool_limits, tools_rollout Ops (22 files): Caddyfile, calendar compose, grafana voice dashboard, deployments/incidents logs, runbooks for alerts/audit/backlog/incidents/sofiia/voice, cron jobs, scripts (alert_triage, audit_cleanup, migrate_*, governance, schedule), task_registry, voice alerts/ha/latency/policy Docs (30+ files): HUMANIZED_STEPAN v2.7-v3 changelogs and runbooks, NODA1/NODA2 status and setup, audit index and traces, backlog, incident, supervisor, tools, voice, opencode, release, risk, aistalk, spacebot Made-with: Cursor
110 lines
3.4 KiB
Markdown
110 lines
3.4 KiB
Markdown
# Sofiia Console v1.0 Release Readiness Summary
|
||
|
||
One-page go/no-go артефакт для релізного рішення по `sofiia-console`.
|
||
|
||
## 1) Scope & Version
|
||
|
||
- Service: `sofiia-console`
|
||
- Target version / tag: `v1.0` (to be assigned at release cut)
|
||
- Git SHAs:
|
||
- sofiia-console: `e75fd33`
|
||
- router: `<set at release window>`
|
||
- gateway: `<set at release window>`
|
||
- Deployment target:
|
||
- NODA1: production runtime/data plane
|
||
- NODA2: control plane / sofiia-console
|
||
- Date prepared: `<set at release window>`
|
||
- Prepared by: `<operator>`
|
||
|
||
## 2) Production Guarantees
|
||
|
||
### Reliability
|
||
|
||
- Idempotent `POST /api/chats/{chat_id}/send` with selectable backend (`inmemory|redis`).
|
||
- Multi-node routing covered by E2E tests (NODA1/NODA2 via `infer` monkeypatch path).
|
||
- Cursor pagination hardened with tie-breakers (`(ts,id)` / stable ordering semantics).
|
||
- Release process formalized via preflight + release runbook + smoke scripts.
|
||
|
||
### Security
|
||
|
||
- Rate limiting on send path:
|
||
- per-chat scope
|
||
- per-operator scope
|
||
- Strict `/api/audit` protection:
|
||
- key required
|
||
- no localhost bypass
|
||
- Structured audit trail:
|
||
- write events for operator actions
|
||
- cursor-based read endpoint
|
||
- Secrets rotation runbook documented and operational.
|
||
|
||
### Operational Controls
|
||
|
||
- `/metrics` exposed (including rate-limit and idempotency counters).
|
||
- Structured JSON logs for send/replay/pagination/error flows.
|
||
- Audit retention policy in place (default 90 days).
|
||
- Pruning script available (`ops/prune_audit_db.py`: dry-run + batch delete + optional vacuum).
|
||
- Release evidence auto-generator available (`ops/generate_release_evidence.sh`).
|
||
|
||
## 3) Known Limitations / Residual Risks
|
||
|
||
- Chat index is still local DB-backed; full multi-instance HA for global chat index needs Phase 6 (Redis ChatIndexStore).
|
||
- Rate-limit defaults to `inmemory`; multi-instance consistency needs `SOFIIA_RATE_LIMIT_BACKEND=redis`.
|
||
- Audit storage is SQLite (single-node storage, non-clustered by default).
|
||
- Automatic alerting/paging is not yet enabled; metric observation is primarily manual/runbook-driven.
|
||
|
||
## 4) Required Release-Day Checks
|
||
|
||
### Preflight
|
||
|
||
- `STRICT=1 bash ops/preflight_sofiia_console.sh`
|
||
|
||
### Deploy order
|
||
|
||
- NODA2 precheck
|
||
- NODA1 rollout
|
||
- NODA2 finalize
|
||
|
||
### Smoke
|
||
|
||
- `GET /api/health` -> `200`
|
||
- `/metrics` reachable
|
||
- `bash ops/redis_idempotency_smoke.sh` -> `PASS` (when redis backend is enabled)
|
||
- `/api/audit` auth:
|
||
- without key -> `401`
|
||
- with key -> `200`
|
||
|
||
### Post-release
|
||
|
||
- Verify rate-limit metrics increment under controlled load.
|
||
- Verify audit write/read quick check.
|
||
- Run retention dry-run:
|
||
- `python3 ops/prune_audit_db.py --dry-run`
|
||
|
||
## 5) Explicit Go / No-Go Criteria
|
||
|
||
**GO if all conditions hold:**
|
||
|
||
- Preflight is `PASS` (or only non-critical `WARN` accepted by operator).
|
||
- Smoke checks pass.
|
||
- No unexpected 5xx spike during first 5–10 minutes.
|
||
- Rate-limit counters and idempotency behavior are within expected range.
|
||
|
||
**NO-GO if any condition holds:**
|
||
|
||
- Strict audit auth fails (401/200 behavior broken).
|
||
- Redis idempotency A/B smoke fails.
|
||
- Audit write/read fails.
|
||
- Unexpected 500s on send path.
|
||
|
||
## 6) Rollback Readiness Statement
|
||
|
||
- Rollback method:
|
||
- revert to previous known-good SHA/tag
|
||
- restart affected services via docker compose/systemd as per runbook
|
||
- Estimated rollback time: `<set by operator, typically 5-15 min>`
|
||
- Mandatory post-rollback smoke:
|
||
- `/api/health`
|
||
- idempotency smoke
|
||
- audit auth/read checks
|