Config policies (16 files): alert_routing, architecture_pressure, backlog, cost_weights, data_governance, incident_escalation, incident_intelligence, network_allowlist, nodes_registry, observability_sources, rbac_tools_matrix, release_gate, risk_attribution, risk_policy, slo_policy, tool_limits, tools_rollout Ops (22 files): Caddyfile, calendar compose, grafana voice dashboard, deployments/incidents logs, runbooks for alerts/audit/backlog/incidents/sofiia/voice, cron jobs, scripts (alert_triage, audit_cleanup, migrate_*, governance, schedule), task_registry, voice alerts/ha/latency/policy Docs (30+ files): HUMANIZED_STEPAN v2.7-v3 changelogs and runbooks, NODA1/NODA2 status and setup, audit index and traces, backlog, incident, supervisor, tools, voice, opencode, release, risk, aistalk, spacebot Made-with: Cursor
195 lines
6.9 KiB
Markdown
195 lines
6.9 KiB
Markdown
# Sofiia UI vNext — Audit Report
|
|
|
|
> Generated: 2026-02-26 | Scope: file uploads, document DB, session memory, dialog map
|
|
|
|
---
|
|
|
|
## 1. Existing Infrastructure (What We Reuse)
|
|
|
|
### Document Processing — `gateway-bot/services/doc_service.py`
|
|
Fully working channel-agnostic document service:
|
|
- `parse_document()` → Swapper `/document` endpoint → markdown/text
|
|
- `ingest_document()` → Router `POST /v1/documents/ingest` → Qdrant chunks
|
|
- `ask_about_document()` → RAG query via Router
|
|
- `extract_summary_from_bytes()` — local extraction for XLSX/CSV/PDF
|
|
|
|
Supported formats (from gateway-bot/http_api.py):
|
|
`.pdf .doc .docx .rtf .odt .txt .md .csv .tsv .xls .xlsx .xlsm .ods`
|
|
|
|
**Plan:** sofiia-console proxies uploads to Router `/v1/documents/ingest` (same path as Telegram).
|
|
|
|
### Storage on NODA2 (`docker-compose.memory-node2.yml`)
|
|
| Storage | Container | Port | Notes |
|
|
|---|---|---|---|
|
|
| PostgreSQL 16 | `dagi-postgres-node2` | 5433 | DB: `daarion_memory`, tables: sofiia_messages etc. |
|
|
| Qdrant 1.12.4 | `dagi-qdrant-node2` | 6333 | Collections: memories, sofiia_messages, sofiia_summaries |
|
|
| Neo4j 5.15 | `dagi-neo4j-node2` | 7687 | Available for Phase 2 dialog graph |
|
|
|
|
### Memory Service Endpoints (Reusable)
|
|
- `POST /agents/{agent_id}/memory` — save chat turn → Postgres + Qdrant + Neo4j
|
|
- `GET /agents/{agent_id}/memory` — retrieve recent events
|
|
- `POST /threads` / `GET /threads/{id}` — conversation threads
|
|
- `POST /memories` — long-term memory with semantic search
|
|
- `POST /retrieve` — vector search across memories
|
|
- `POST /facts/upsert` / `GET /facts/{key}` — key-value store
|
|
|
|
### sofiia-console (What Already Exists)
|
|
- `_do_save_memory()` — auto-saves every chat turn to Memory Service
|
|
- `GET /api/memory/context` — retrieves context for session
|
|
- `POST /api/voice/stt` — file upload (multipart) → memory-service STT
|
|
- `session_id`, `project_id`, `user_id` — already in request model
|
|
|
|
---
|
|
|
|
## 2. What Is Missing (What We Build)
|
|
|
|
| Component | Status | Plan |
|
|
|---|---|---|
|
|
| sofiia-console `DATABASE_URL` | ❌ MISSING | Add to docker-compose + SQLite fallback |
|
|
| `POST /api/files/upload` | ❌ MISSING | Build in sofiia-console BFF |
|
|
| `projects` table | ❌ MISSING | SQLite (Phase 1), Postgres (Phase 2) |
|
|
| `documents` table | ❌ MISSING | SQLite + metadata |
|
|
| `sessions` table | ❌ MISSING | SQLite + `started_at`, `last_active` |
|
|
| `messages` table | ❌ MISSING | SQLite + `parent_msg_id` for branching |
|
|
| `GET /api/chat/history` | ❌ MISSING | Load messages from SQLite |
|
|
| Projects sidebar UI | ❌ MISSING | Left panel in index.html |
|
|
| Dialog Map (tree) | ❌ MISSING | Collapsible tree + branching |
|
|
| Upload UI button | ❌ MISSING | Paperclip icon in chat bar |
|
|
|
|
---
|
|
|
|
## 3. Architecture Decision: SQLite First
|
|
|
|
**Rationale:** sofiia-console currently has no DB. Adding a new Postgres connection
|
|
requires network config changes and service dependency. SQLite:
|
|
- Zero infra changes (just a volume mount)
|
|
- Works immediately in Docker
|
|
- Can migrate to Postgres later via `aiosqlite` → `asyncpg`
|
|
- Sufficient for 1 user (operator) console workload
|
|
|
|
**Phase 2:** `DATABASE_URL=postgresql://...` env override → same schema via asyncpg.
|
|
|
|
---
|
|
|
|
## 4. Storage Schema (Phase 1)
|
|
|
|
```sql
|
|
-- projects
|
|
CREATE TABLE projects (
|
|
project_id TEXT PRIMARY KEY,
|
|
name TEXT NOT NULL,
|
|
description TEXT DEFAULT '',
|
|
created_at TEXT NOT NULL, -- ISO8601
|
|
updated_at TEXT NOT NULL
|
|
);
|
|
|
|
-- documents
|
|
CREATE TABLE documents (
|
|
doc_id TEXT PRIMARY KEY,
|
|
project_id TEXT NOT NULL REFERENCES projects(project_id),
|
|
file_id TEXT NOT NULL,
|
|
sha256 TEXT NOT NULL,
|
|
mime TEXT NOT NULL,
|
|
size_bytes INTEGER NOT NULL,
|
|
filename TEXT NOT NULL,
|
|
title TEXT DEFAULT '',
|
|
tags TEXT DEFAULT '[]', -- JSON array
|
|
created_at TEXT NOT NULL,
|
|
extracted_text TEXT DEFAULT '' -- first 4KB preview
|
|
);
|
|
|
|
-- sessions
|
|
CREATE TABLE sessions (
|
|
session_id TEXT PRIMARY KEY,
|
|
project_id TEXT NOT NULL REFERENCES projects(project_id),
|
|
title TEXT DEFAULT '',
|
|
started_at TEXT NOT NULL,
|
|
last_active TEXT NOT NULL,
|
|
turn_count INTEGER DEFAULT 0
|
|
);
|
|
|
|
-- messages (with branching via parent_msg_id)
|
|
CREATE TABLE messages (
|
|
msg_id TEXT PRIMARY KEY,
|
|
session_id TEXT NOT NULL REFERENCES sessions(session_id),
|
|
role TEXT NOT NULL, -- "user" | "assistant"
|
|
content TEXT NOT NULL,
|
|
ts TEXT NOT NULL, -- ISO8601
|
|
parent_msg_id TEXT, -- NULL for first message; enables branching
|
|
branch_label TEXT DEFAULT '' -- "main" | "branch-1" | etc.
|
|
);
|
|
```
|
|
|
|
---
|
|
|
|
## 5. File Upload Architecture
|
|
|
|
```
|
|
Browser → POST /api/files/upload (multipart)
|
|
↓
|
|
BFF: validate mime + size
|
|
↓
|
|
Save to ./data/uploads/{sha256[:2]}/{sha256}_{filename}
|
|
↓
|
|
Extract text (pdf/docx/txt/md via python libs or Router OCR)
|
|
↓
|
|
Store metadata in documents table
|
|
↓
|
|
POST /v1/documents/ingest → Qdrant (async, best-effort)
|
|
↓
|
|
Return: {file_id, sha256, mime, size, preview_text, doc_id}
|
|
```
|
|
|
|
Size limits (env-configurable):
|
|
| Type | Env | Default |
|
|
|---|---|---|
|
|
| Images | `UPLOAD_MAX_IMAGE_MB` | 10 MB |
|
|
| Videos | `UPLOAD_MAX_VIDEO_MB` | 200 MB |
|
|
| Docs | `UPLOAD_MAX_DOC_MB` | 50 MB |
|
|
|
|
---
|
|
|
|
## 6. Session Persistence Strategy
|
|
|
|
**Current:** session_id generated on each `/api/chat/send` → not persisted between page loads.
|
|
|
|
**Phase 1 Fix:**
|
|
1. Browser stores `session_id` in `localStorage`
|
|
2. BFF `GET /api/sessions/{session_id}` checks if session exists → load last N messages
|
|
3. New `/api/chat/send` saves messages to SQLite `messages` table
|
|
4. `GET /api/chat/history?session_id=...&limit=50` returns ordered messages
|
|
|
|
---
|
|
|
|
## 7. Dialog Map (Phase 1: Tree View)
|
|
|
|
**Not a full graph canvas** — collapsible tree in UI:
|
|
- Each session = root node
|
|
- Each assistant turn = child node
|
|
- "Fork from message" creates a new branch (new `session_id` with `parent_msg_id`)
|
|
- UI renders as nested `<details>` tree, no canvas required
|
|
- `GET /api/sessions/{session_id}/map` returns `{nodes, edges}` JSON
|
|
|
|
**Phase 2:** Upgrade to D3.js force-directed graph or Cytoscape.js when Neo4j available.
|
|
|
|
---
|
|
|
|
## 8. Integration Hooks (Phase 2 Flags)
|
|
|
|
```python
|
|
USE_FABRIC_OCR = os.getenv("USE_FABRIC_OCR", "false").lower() == "true"
|
|
USE_EMBEDDINGS = os.getenv("USE_EMBEDDINGS", "false").lower() == "true"
|
|
```
|
|
|
|
- `USE_FABRIC_OCR=true` → images/PDFs go through Router `/v1/capability/ocr`
|
|
- `USE_EMBEDDINGS=true` → extracted text indexed in Qdrant via Memory Service
|
|
|
|
---
|
|
|
|
## 9. Constraints
|
|
|
|
- Access: localhost-only by default (Docker port binding `127.0.0.1:8002:8002`)
|
|
- Secrets: never stored in upload files or exposed in API responses
|
|
- Filename sanitization: `secure_filename()` + sha256 as storage key (no path traversal)
|
|
- Content-type: validated server-side via `python-magic` or file header bytes (not just extension)
|