# Sofiia UI vNext — Audit Report > Generated: 2026-02-26 | Scope: file uploads, document DB, session memory, dialog map --- ## 1. Existing Infrastructure (What We Reuse) ### Document Processing — `gateway-bot/services/doc_service.py` Fully working channel-agnostic document service: - `parse_document()` → Swapper `/document` endpoint → markdown/text - `ingest_document()` → Router `POST /v1/documents/ingest` → Qdrant chunks - `ask_about_document()` → RAG query via Router - `extract_summary_from_bytes()` — local extraction for XLSX/CSV/PDF Supported formats (from gateway-bot/http_api.py): `.pdf .doc .docx .rtf .odt .txt .md .csv .tsv .xls .xlsx .xlsm .ods` **Plan:** sofiia-console proxies uploads to Router `/v1/documents/ingest` (same path as Telegram). ### Storage on NODA2 (`docker-compose.memory-node2.yml`) | Storage | Container | Port | Notes | |---|---|---|---| | PostgreSQL 16 | `dagi-postgres-node2` | 5433 | DB: `daarion_memory`, tables: sofiia_messages etc. | | Qdrant 1.12.4 | `dagi-qdrant-node2` | 6333 | Collections: memories, sofiia_messages, sofiia_summaries | | Neo4j 5.15 | `dagi-neo4j-node2` | 7687 | Available for Phase 2 dialog graph | ### Memory Service Endpoints (Reusable) - `POST /agents/{agent_id}/memory` — save chat turn → Postgres + Qdrant + Neo4j - `GET /agents/{agent_id}/memory` — retrieve recent events - `POST /threads` / `GET /threads/{id}` — conversation threads - `POST /memories` — long-term memory with semantic search - `POST /retrieve` — vector search across memories - `POST /facts/upsert` / `GET /facts/{key}` — key-value store ### sofiia-console (What Already Exists) - `_do_save_memory()` — auto-saves every chat turn to Memory Service - `GET /api/memory/context` — retrieves context for session - `POST /api/voice/stt` — file upload (multipart) → memory-service STT - `session_id`, `project_id`, `user_id` — already in request model --- ## 2. What Is Missing (What We Build) | Component | Status | Plan | |---|---|---| | sofiia-console `DATABASE_URL` | ❌ MISSING | Add to docker-compose + SQLite fallback | | `POST /api/files/upload` | ❌ MISSING | Build in sofiia-console BFF | | `projects` table | ❌ MISSING | SQLite (Phase 1), Postgres (Phase 2) | | `documents` table | ❌ MISSING | SQLite + metadata | | `sessions` table | ❌ MISSING | SQLite + `started_at`, `last_active` | | `messages` table | ❌ MISSING | SQLite + `parent_msg_id` for branching | | `GET /api/chat/history` | ❌ MISSING | Load messages from SQLite | | Projects sidebar UI | ❌ MISSING | Left panel in index.html | | Dialog Map (tree) | ❌ MISSING | Collapsible tree + branching | | Upload UI button | ❌ MISSING | Paperclip icon in chat bar | --- ## 3. Architecture Decision: SQLite First **Rationale:** sofiia-console currently has no DB. Adding a new Postgres connection requires network config changes and service dependency. SQLite: - Zero infra changes (just a volume mount) - Works immediately in Docker - Can migrate to Postgres later via `aiosqlite` → `asyncpg` - Sufficient for 1 user (operator) console workload **Phase 2:** `DATABASE_URL=postgresql://...` env override → same schema via asyncpg. --- ## 4. Storage Schema (Phase 1) ```sql -- projects CREATE TABLE projects ( project_id TEXT PRIMARY KEY, name TEXT NOT NULL, description TEXT DEFAULT '', created_at TEXT NOT NULL, -- ISO8601 updated_at TEXT NOT NULL ); -- documents CREATE TABLE documents ( doc_id TEXT PRIMARY KEY, project_id TEXT NOT NULL REFERENCES projects(project_id), file_id TEXT NOT NULL, sha256 TEXT NOT NULL, mime TEXT NOT NULL, size_bytes INTEGER NOT NULL, filename TEXT NOT NULL, title TEXT DEFAULT '', tags TEXT DEFAULT '[]', -- JSON array created_at TEXT NOT NULL, extracted_text TEXT DEFAULT '' -- first 4KB preview ); -- sessions CREATE TABLE sessions ( session_id TEXT PRIMARY KEY, project_id TEXT NOT NULL REFERENCES projects(project_id), title TEXT DEFAULT '', started_at TEXT NOT NULL, last_active TEXT NOT NULL, turn_count INTEGER DEFAULT 0 ); -- messages (with branching via parent_msg_id) CREATE TABLE messages ( msg_id TEXT PRIMARY KEY, session_id TEXT NOT NULL REFERENCES sessions(session_id), role TEXT NOT NULL, -- "user" | "assistant" content TEXT NOT NULL, ts TEXT NOT NULL, -- ISO8601 parent_msg_id TEXT, -- NULL for first message; enables branching branch_label TEXT DEFAULT '' -- "main" | "branch-1" | etc. ); ``` --- ## 5. File Upload Architecture ``` Browser → POST /api/files/upload (multipart) ↓ BFF: validate mime + size ↓ Save to ./data/uploads/{sha256[:2]}/{sha256}_{filename} ↓ Extract text (pdf/docx/txt/md via python libs or Router OCR) ↓ Store metadata in documents table ↓ POST /v1/documents/ingest → Qdrant (async, best-effort) ↓ Return: {file_id, sha256, mime, size, preview_text, doc_id} ``` Size limits (env-configurable): | Type | Env | Default | |---|---|---| | Images | `UPLOAD_MAX_IMAGE_MB` | 10 MB | | Videos | `UPLOAD_MAX_VIDEO_MB` | 200 MB | | Docs | `UPLOAD_MAX_DOC_MB` | 50 MB | --- ## 6. Session Persistence Strategy **Current:** session_id generated on each `/api/chat/send` → not persisted between page loads. **Phase 1 Fix:** 1. Browser stores `session_id` in `localStorage` 2. BFF `GET /api/sessions/{session_id}` checks if session exists → load last N messages 3. New `/api/chat/send` saves messages to SQLite `messages` table 4. `GET /api/chat/history?session_id=...&limit=50` returns ordered messages --- ## 7. Dialog Map (Phase 1: Tree View) **Not a full graph canvas** — collapsible tree in UI: - Each session = root node - Each assistant turn = child node - "Fork from message" creates a new branch (new `session_id` with `parent_msg_id`) - UI renders as nested `
` tree, no canvas required - `GET /api/sessions/{session_id}/map` returns `{nodes, edges}` JSON **Phase 2:** Upgrade to D3.js force-directed graph or Cytoscape.js when Neo4j available. --- ## 8. Integration Hooks (Phase 2 Flags) ```python USE_FABRIC_OCR = os.getenv("USE_FABRIC_OCR", "false").lower() == "true" USE_EMBEDDINGS = os.getenv("USE_EMBEDDINGS", "false").lower() == "true" ``` - `USE_FABRIC_OCR=true` → images/PDFs go through Router `/v1/capability/ocr` - `USE_EMBEDDINGS=true` → extracted text indexed in Qdrant via Memory Service --- ## 9. Constraints - Access: localhost-only by default (Docker port binding `127.0.0.1:8002:8002`) - Secrets: never stored in upload files or exposed in API responses - Filename sanitization: `secure_filename()` + sha256 as storage key (no path traversal) - Content-type: validated server-side via `python-magic` or file header bytes (not just extension)