docs(platform): add policy configs, runbooks, ops scripts and platform documentation

Config policies (16 files): alert_routing, architecture_pressure, backlog, cost_weights, data_governance, incident_escalation, incident_intelligence, network_allowlist, nodes_registry, observability_sources, rbac_tools_matrix, release_gate, risk_attribution, risk_policy, slo_policy, tool_limits, tools_rollout Ops (22 files): Caddyfile, calendar compose, grafana voice dashboard, deployments/incidents logs, runbooks for alerts/audit/backlog/incidents/sofiia/voice, cron jobs, scripts (alert_triage, audit_cleanup, migrate_*, governance, schedule), task_registry, voice alerts/ha/latency/policy Docs (30+ files): HUMANIZED_STEPAN v2.7-v3 changelogs and runbooks, NODA1/NODA2 status and setup, audit index and traces, backlog, incident, supervisor, tools, voice, opencode, release, risk, aistalk, spacebot Made-with: Cursor
2026-03-03 07:14:53 -08:00
parent 129e4ea1fc
commit 67225a39fa
102 changed files with 20060 additions and 0 deletions
--- a/docs/audit/sofiia_intelligence_system_trace.md
+++ b/docs/audit/sofiia_intelligence_system_trace.md
@@ -0,0 +1,441 @@
+# Sofiia CTO Agent — Intelligence System Trace (C)
+
+> Generated: 2026-02-26 | Реконструкція "інтелектуальної системи" Sofiia
+
+---
+
+## Загальна схема мислення
+
+```
+User Input (Telegram / Console / Voice)
+        │
+        ▼
+  [BFF: sofiia-console]
+  Auth + Rate limit + Session
+        │
+        ├─── Voice turn? ──► STT (memory-service) → sanitize_for_voice() → voice_fast_uk
+        │
+        └─── Text turn? ──► [Router /v1/agents/sofiia/infer]
+                                     │
+                        ┌────────────┴────────────┐
+                        │                         │
+                   LLM selection            Tool call?
+                   (profile-based)          (tool_manager)
+                        │                         │
+                  [LLM response]          [Tool execution]
+                        │                         │
+                   <think> strip          RBAC check
+                        │                         │
+                   Memory save            Evidence
+                        │                         │
+                        └────────┬────────────────┘
+                                 │
+                         [Dialog Map update]
+                         (SQLite tree / future Postgres graph)
+                                 │
+                         [Response to User]
+                                 │
+                         [TTS if voice mode]
+```
+
+---
+
+## 1. Intent → Plan → Execute (Canonical CTO Flow)
+
+### 1.1 Документовано
+- **Docs:** `AGENTS.md` §Example Commands, `docs/ADR_ARCHITECTURE_VNEXT.md` §3.1 CrewAI Workers
+- **Concept:** "Chat/Intent → Plan (Artifacts) → Execute as Job → Evidence → Dialog Map"
+- **vNext Design:** вся концепція описана в цьому сеансі розмови
+
+### 1.2 Реалізовано
+- **Intent → Plan:** ✅ LLM inference через Router (`/v1/agents/sofiia/infer`)
+- **Plan → Execute (Ops):** ✅ `/api/ops/run` dispatches pre-defined actions
+- **Execute → Evidence:** ⚠️ частково — ops повертає result, але не зберігає як artifact
+- **Evidence → Dialog Map:** ❌ ops artifacts не зшиваються в dialog_nodes
+
+### 1.3 Розриви
+- Немає загального **Job System** (тільки pre-defined ops actions)
+- Немає `repo_changesets` / `ops_runs` як артефактів у DB
+- Dialog Map не оновлюється автоматично від ops actions
+
+---
+
+## 2. Модулі Архітектури
+
+### 2.1 BFF (sofiia-console)
+
+**Документовано тут:**
+- `docs/runbook/sofiia-control-plane.md`
+- `docs/sofiia_ui_vnext_audit.md`
+- `docs/fabric_contract.md`
+
+**Реалізовано тут:**
+- `services/sofiia-console/app/main.py` — FastAPI v0.3.0
+- `services/sofiia-console/app/config.py` — node registry, ENV loading
+- `docker-compose.node2-sofiia.yml` — deployment config
+
+**Що BFF робить:**
+```
+1. API Gateway для UI (chat/voice/projects/ops/nodes)
+2. Session management (SQLite sofiia.db)
+3. Multi-provider LLM proxy (ollama/router/glm/grok)
+4. Voice pipeline (STT→LLM→TTS, Phase 2 streaming)
+5. Ops dispatcher (risk/pressure/backlog/notion/release)
+6. Multi-node health monitor (polling + WebSocket fan-out)
+7. Memory save (SQLite first, then Memory Service best-effort)
+```
+
+**Розриви:**
+- Відсутній єдиний Job tracking (кожен ops action — one-shot, без persist)
+- Відсутній `repo_changesets` flow
+- `ops.html`, `chat.html`, `nodes.html` — fallback HTML, не окремі файли
+
+---
+
+### 2.2 LLM Routing
+
+**Документовано тут:**
+- `services/router/router-config.yml`
+- `docs/architecture_inventory/01_SERVICE_CATALOG.md`
+- `docs/OPENAPI_CONTRACTS.md`
+
+**Реалізовано тут:**
+- `services/router/main.py` — `/v1/agents/{agent_id}/infer`
+- `services/router/router-config.yml` — `sofiia:` entry
+
+**Конфігурація Sofiia (router-config.yml):**
+```yaml
+sofiia:
+  primary: cloud_grok        # Grok API (Telegram mode)
+  fallback: cloud_deepseek   # DeepSeek API
+  # Console mode може override через ollama
+```
+
+**Voice profiles:**
+```yaml
+voice_fast_uk:
+  prefer_models: [gemma3:latest, qwen3.5:35b-a3b, qwen3:14b]
+  deadline_ms: 9000
+  max_tokens: 256
+  
+voice_quality_uk:
+  prefer_models: [qwen3.5:35b-a3b, qwen3:14b]
+  deadline_ms: 12000
+  max_tokens: 256
+```
+
+**Розриви:**
+- Відсутній профіль для `repo_changeset` (long-form, structured output)
+- Відсутній профіль для `plan_generation` (CTO structured plans)
+
+---
+
+### 2.3 Tool System
+
+**Документовано тут:**
+- `AGENTS.md` §Tool List
+- `docs/architecture_inventory/02_TOOL_CATALOG.md`
+- `config/rbac_tools_matrix.yml`
+
+**Реалізовано тут:**
+- `services/router/tool_manager.py` — TOOL_DEFINITIONS + execution
+- `services/router/agent_tools_config.py` — per-agent allowlists
+
+**RBAC роль `agent_cto`** (39 permissions):
+```
+docs: read       ops: read/exec_safe
+repo: read       jobs: smoke/drift/backup/deploy
+kb: read         risk: read/write
+pr_review: use   pressure: read/write
+contract: use    backlog: read/write/admin
+config_lint: use deps: read/gate
+threatmodel: use cost: read/gate
+observability    drift: read/gate
+incidents: write alerts: ingest/read/ack/claim
+```
+
+**Sofiia спеціалізовані tools (agent_tools_config.py):**
+```python
+AGENT_SPECIALIZED_TOOLS["sofiia"] = [
+    "comfy_generate_image",
+    "comfy_generate_video",
+    "risk_engine_tool",
+    "architecture_pressure_tool",
+    "backlog_tool",
+    "job_orchestrator_tool",
+    "dependency_scanner_tool",
+    "incident_intelligence_tool",
+    "cost_analyzer_tool",
+    "pieces_tool",
+    "notion_tool",
+]
+```
+
+**FULL_STANDARD_STACK** (16 tools available to all agents):
+```
+memory_search, graph_query, web_search, web_extract, crawl4ai_scrape,
+remember_fact, image_generate, tts_speak, presentation_create/status/download,
+file_tool, repo_tool, pr_reviewer_tool, contract_tool, oncall_tool,
+observability_tool, config_linter_tool, threatmodel_tool, job_orchestrator_tool,
+kb_tool, drift_analyzer_tool, pieces_tool
+```
+
+**Розриви:**
+- Відсутній `repo_changeset_tool` (create/patch/plan/pr)
+- Відсутній `ops_job_tool` (start/status/cancel з job tracking)
+- `job_orchestrator_tool` є, але не пов'язаний з Dialog Map artifact creation
+
+---
+
+### 2.4 Memory System
+
+**Документовано тут:**
+- `docs/ADR_ARCHITECTURE_VNEXT.md` §2.5 Memory Service
+- `docs/MEMORY_API_POLICY.md`
+- `docs/AGENT-MEMORY-STANDARD.md`
+
+**Реалізовано тут:**
+- `services/memory-service/app/main.py` — threads/events/memories/facts/agents
+- `services/memory-service/app/vector_store.py` — Qdrant
+- `docker-compose.memory-node2.yml` — Postgres + Qdrant + Neo4j
+
+**3 рівні пам'яті (згідно ADR):**
+
+| Рівень | Qdrant | Neo4j | Postgres |
+|--------|--------|-------|----------|
+| Personal | `user_{id}_*` | `:User` nodes | `user_facts`, `user_sessions` |
+| Team/DAO | `team_{id}_*` | `:Team`, `:Project` | `team_facts`, `team_quotas` |
+| Public | `public_*` | `:Public` | `indexed_content` |
+
+**Реальні колекції (NODA2):**
+- `sofiia_messages` — 1183+ points
+- `sofiia_summaries`
+- Memory Service Postgres (port 5433, db `daarion_memory`)
+
+**Console-рівень пам'яті (SQLite `sofiia.db`):**
+```sql
+projects, documents, sessions, messages
+```
+
+**Розриви:**
+- Team/DAO namespace: описаний в ADR, реалізований лише для Personal
+- E2EE для confidential: тільки в ADR, не реалізовано
+- BFF і Memory Service "знають" одне про одного, але sync неповний
+
+---
+
+### 2.5 Planning System (Supervisor)
+
+**Документовано тут:**
+- `docs/supervisor/langgraph_supervisor.md`
+- `docs/supervisor/postmortem_draft_graph.md`
+
+**Реалізовано тут:**
+- `services/sofiia-supervisor/app/main.py`
+- `services/sofiia-supervisor/app/graphs/`
+
+**Доступні LangGraph графи:**
+```
+alert_triage     → класифікація/ескалація алертів
+incident_triage  → тріаж інцидентів (SLO, labels, owners)
+postmortem_draft → автогенерація postmortem документа
+release_check    → pre-release gate checks
+```
+
+**Архітектура (загальна):**
+```
+Event/Trigger → LangGraph Node → State update → Next Node
+      ↓              ↓
+  NATS event    Tool calls (via gateway_client)
+                Memory writes
+                Structured output (JSON)
+```
+
+**Розриви:**
+- Немає `cto_intent_graph` (intent → plan → execute)
+- Немає `repo_changeset_graph` (diff → plan → PR)
+- Немає `dialog_map_builder_graph` (events → nodes/edges)
+- Supervisor ізольований від BFF (не інтегрований у `/api/ops/run`)
+
+---
+
+## 3. Policies (Безпека, Дозволи, Approval)
+
+### 3.1 Документовано
+- `docs/PRIVACY_GATE.md` — Privacy Gate middleware
+- `docs/ADR_ARCHITECTURE_VNEXT.md` §4 Privacy Gate
+- `docs/AGENT_RUNTIME_POLICY.md`
+- `config/rbac_tools_matrix.yml`
+- `config/data_governance_policy.yml`
+- `config/risk_policy.yml`
+
+### 3.2 Реалізовано
+- RBAC tool allowlist: ✅ `agent_tools_config.py`
+- API key auth: ✅ `auth.py`
+- Rate limiting: ✅ per-endpoint
+- Upload sanitization: ✅ mime + filename + size
+- Voice guardrails: ✅ `sanitize_for_voice()`
+- Config linter (secrets detection): ✅ `tool_manager.py`
+
+### 3.3 Не реалізовано
+- **Privacy Gate middleware** (перевірка `mode=confidential` в Router): 📄 описаний, не реалізований
+- **2-step Plan → Apply flow**: 📄 описаний як "dangerous actions", не реалізований
+- **E2EE client-side encryption**: 📄 тільки ADR, не реалізований
+- **Confidential doc indexing block**: 📄 тільки ADR, не реалізований
+
+---
+
+## 4. Event Model
+
+### 4.1 Документовано
+- `docs/ADR_ARCHITECTURE_VNEXT.md` §5 NATS Standards
+- `docs/NATS_SUBJECTS.md`
+- `docs/NATS_SUBJECT_MAP.md`
+
+### 4.2 NATS Subjects (ADR canonical)
+```
+message.created.{channel_id}      # chat messages
+attachment.created.{type}          # uploaded files
+agent.run.requested.{agent_id}    # agent activation
+agent.run.completed.{agent_id}
+quota.consumed.{user_id}
+audit.{service}.{action}          # append-only audit
+ops.health.{service}
+ops.alert.{severity}
+```
+
+### 4.3 Fabric Subjects (реалізовані у node-worker)
+```
+node.{id}.llm.request             # LLM offload
+node.{id}.tts.request             # TTS offload
+node.{id}.stt.request             # STT offload
+node.{id}.voice.llm.request       # Voice LLM (dedicated)
+node.{id}.voice.tts.request       # Voice TTS (dedicated)
+node.{id}.voice.stt.request       # Voice STT (dedicated)
+node.{id}.ocr.request             # OCR offload
+node.{id}.crawl.request           # Crawl offload
+node.{id}.image.request           # Image generation
+```
+
+### 4.4 Розриви
+- `attachment.created` — реалізований частково (upload зберігає файл, але не публікує у NATS)
+- `task_create`, `doc_upsert`, `meeting_create` — не реалізовані (потрібні для Dialog Map auto-edge)
+- `agent.run.requested` → legacy flat subject ще може бути в деяких шляхах (відомий drift)
+- Dialog Map не підписаний на NATS events
+
+---
+
+## 5. Memory Architecture (деталізована)
+
+```
+┌──────────────────────────────────────────────────────────┐
+│                    Sofiia Memory Layers                   │
+├──────────────────────────────────────────────────────────┤
+│ Layer 0: Working Context (per-turn)                       │
+│   - history[-12:] in BFF request                         │
+│   - sanitize_for_voice() for voice turns                 │
+├──────────────────────────────────────────────────────────┤
+│ Layer 1: Session Memory (sofiia-console SQLite)           │
+│   Tables: projects, documents, sessions, messages         │
+│   TTL: indefinite (volume-backed)                        │
+│   Fork: parent_msg_id для branching                      │
+├──────────────────────────────────────────────────────────┤
+│ Layer 2: Long-term Memory (Memory Service)                │
+│   Qdrant: sofiia_messages (1183+ vectors)                │
+│            sofiia_summaries                              │
+│   Postgres: daarion_memory DB (facts, threads, events)   │
+│   Neo4j: agent memory graph (infrastructure ready)       │
+├──────────────────────────────────────────────────────────┤
+│ Layer 3: Factual Memory (Key-Value)                       │
+│   /facts/upsert, /facts/{key}                            │
+│   Rolling summaries via /threads/{id}/summarize          │
+└──────────────────────────────────────────────────────────┘
+```
+
+**Namespaces (implemented):**
+- `sofiia_messages` — agent-specific collection
+- Загальний: `{agent_id}_{type}` pattern
+
+**Sync між Layer 1 і Layer 2:**
+- `_do_save_memory()` у `main.py`: спочатку SQLite, потім Memory Service (best-effort)
+- Немає зворотнього sync (Memory Service → SQLite)
+- Немає конфліктів (append-only обидва)
+
+---
+
+## 6. Dialog Map Intelligence
+
+### Поточна реалізація (Phase 1)
+```
+SQLite messages table (parent_msg_id = branching)
+       ↓
+GET /api/sessions/{sid}/map
+       ↓
+Python: build_tree(messages) → nodes/edges
+       ↓
+UI: <details><summary> tree
+```
+
+### Цільова реалізація (vNext Phase 2)
+```
+NATS events (task_create, doc_upsert, meeting_create)
+       ↓
+Dialog Map Builder (новий сервіс або Supervisor граф)
+       ↓
+Postgres: dialog_nodes + dialog_edges
+       ↓
+GET /projects/{id}/dialog-map
+       ↓
+UI: D3/Cytoscape canvas + live WS updates
+```
+
+**Node types (vNext):**
+- `message` — chat message
+- `task` — задача
+- `doc` — документ/wiki
+- `meeting` — зустріч
+- `agent_run` — виклик агента
+- `decision` — ADR/рішення
+- `goal` — ціль/OKR
+
+**Edge types (vNext):**
+- `references` — A посилається на B
+- `resolves` — A вирішує B
+- `derives_task` — повідомлення → задача
+- `updates_doc` — action → doc version
+- `schedules` — message → meeting
+- `summarizes` — rollup вузол
+
+---
+
+## 7. Preflight-First Policy
+
+**Документовано тут:**
+- `ops/fabric_preflight.sh`
+- `docs/fabric_contract.md`
+
+**Принцип:** "Zero assumptions" — перед будь-яким deploy/change:
+1. Запустити `ops/fabric_preflight.sh`
+2. Перевірити моделі (VOICE_REQUIRED_MODELS fail / VOICE_PREFERRED_MODELS warn)
+3. Перевірити `ops/fabric_snapshot.py --save`
+4. Тільки потім deploy
+
+**Реалізовано:**
+- `ops/fabric_preflight.sh` — перевірки моделей, voice health, canary
+- `ops/scripts/voice_canary.py` — runtime canary (кожні 5–10 хв)
+- `ops/voice_latency_audit.sh` — 10-сценарний latency audit
+
+---
+
+## Next Actions for UI Team (1–2 days)
+
+1. **Ознайомитись із Supervisor API** (`/v1/graphs/{name}/runs`) — це готовий "job runner" для CTO workflows
+2. **Розширити Supervisor**: додати `cto_intent_graph` на базі `release_check_graph` (спільна структура)
+3. **NATS attachment events**: при upload в `docs_router.py` — публікувати `attachment.created` (1 рядок коду)
+4. **Dialog Map NATS listener**: простий consumer що upsert-ить SQLite nodes при events
+5. **`docs_versions` table**: ALTER TABLE + endpoint — 1–2 год роботи
+6. **Privacy Gate stub**: додати перевірку `mode` поля в BFF, навіть якщо без шифрування
+7. **Plan → Apply pattern**: для ops actions — показувати "план" перед запуском
+8. **`agent_id` нормалізація**: замінити `"l"` на `"sofiia"` в node2 router-config.yml
+9. **Memory sync**: додати endpoint для завантаження Sofiia memory з Memory Service у SQLite
+10. **CTO Panel**: mock `/api/repo/changesets` і `/api/ops/runs` endpoints для UI розробки