feat: add Vision Encoder service + Vision RAG implementation

- Vision Encoder Service (OpenCLIP ViT-L/14, GPU-accelerated)
  - FastAPI app with text/image embedding endpoints (768-dim)
  - Docker support with NVIDIA GPU runtime
  - Port 8001, health checks, model info API

- Qdrant Vector Database integration
  - Port 6333/6334 (HTTP/gRPC)
  - Image embeddings storage (768-dim, Cosine distance)
  - Auto collection creation

- Vision RAG implementation
  - VisionEncoderClient (Python client for API)
  - Image Search module (text-to-image, image-to-image)
  - Vision RAG routing in DAGI Router (mode: image_search)
  - VisionEncoderProvider integration

- Documentation (5000+ lines)
  - SYSTEM-INVENTORY.md - Complete system inventory
  - VISION-ENCODER-STATUS.md - Service status
  - VISION-RAG-IMPLEMENTATION.md - Implementation details
  - vision_encoder_deployment_task.md - Deployment checklist
  - services/vision-encoder/README.md - Deployment guide
  - Updated WARP.md, INFRASTRUCTURE.md, Jupyter Notebook

- Testing
  - test-vision-encoder.sh - Smoke tests (6 tests)
  - Unit tests for client, image search, routing

- Services: 17 total (added Vision Encoder + Qdrant)
- AI Models: 3 (qwen3:8b, OpenCLIP ViT-L/14, BAAI/bge-m3)
- GPU Services: 2 (Vision Encoder, Ollama)
- VRAM Usage: ~10 GB (concurrent)

Status: Production Ready 
This commit is contained in:
Apple
2025-11-17 05:24:36 -08:00
parent b2b51f08fb
commit 4601c6fca8
55 changed files with 13205 additions and 3 deletions

View File

@@ -0,0 +1,419 @@
# Task: Channel-agnostic document workflow (PDF + RAG)
## Goal
Make the document (PDF) parsing + RAG workflow **channel-agnostic**, so it can be reused by:
- Telegram bots (DAARWIZZ, Helion)
- Web applications
- Mobile apps
- Any other client via HTTP API
This task defines a shared `doc_service`, HTTP endpoints for non-Telegram clients, and integration of Telegram handlers with this shared layer.
> NOTE: If this task is re-run on a repo where it is already implemented, it should be treated as a validation/refinement task. Existing structures (services, endpoints) SHOULD NOT be removed, only improved if necessary.
---
## Context
### Existing components (expected state)
- Repo root: `microdao-daarion/`
- Gateway service: `gateway-bot/`
Key files:
- `gateway-bot/http_api.py`
- Telegram handlers for DAARWIZZ (`/telegram/webhook`) and Helion (`/helion/telegram/webhook`).
- Voice → STT flow (Whisper via `STT_SERVICE_URL`).
- Discord handler.
- Helper functions: `get_telegram_file_path`, `send_telegram_message`.
- `gateway-bot/memory_client.py`
- `MemoryClient` with methods:
- `get_context`, `save_chat_turn`, `create_dialog_summary`, `upsert_fact`.
- `gateway-bot/app.py`
- FastAPI app, includes `http_api.router` as `gateway_router`.
- CORS configuration.
Router + parser (already implemented in router project):
- DAGI Router supports:
- `mode: "doc_parse"` with provider `parser` → OCRProvider → `parser-service` (DotsOCR).
- `mode: "rag_query"` for RAG questions.
- `parser-service` is available at `http://parser-service:9400`.
The goal of this task is to:
1. Add **channel-agnostic** document service into `gateway-bot`.
2. Add `/api/doc/*` HTTP endpoints for web/mobile.
3. Refactor Telegram handlers to use this service for PDF, `/ingest`, and RAG follow-ups.
4. Store document context in Memory Service via `fact_key = "doc_context:{session_id}"`.
---
## Changes to implement
### 1. Create service: `gateway-bot/services/doc_service.py`
Create a new directory and file:
- `gateway-bot/services/__init__.py`
- `gateway-bot/services/doc_service.py`
#### 1.1. Pydantic models
Define models:
- `QAItem` — single Q&A pair
- `ParsedResult` — result of document parsing
- `IngestResult` — result of ingestion into RAG
- `QAResult` — result of RAG query about a document
- `DocContext` — stored document context
Example fields (can be extended as needed):
- `QAItem`: `question: str`, `answer: str`
- `ParsedResult`:
- `success: bool`
- `doc_id: Optional[str]`
- `qa_pairs: Optional[List[QAItem]]`
- `markdown: Optional[str]`
- `chunks_meta: Optional[Dict[str, Any]]` (e.g., `{"count": int, "chunks": [...]}`)
- `raw: Optional[Dict[str, Any]]` (full payload from router)
- `error: Optional[str]`
- `IngestResult`:
- `success: bool`
- `doc_id: Optional[str]`
- `ingested_chunks: int`
- `status: str`
- `error: Optional[str]`
- `QAResult`:
- `success: bool`
- `answer: Optional[str]`
- `doc_id: Optional[str]`
- `sources: Optional[List[Dict[str, Any]]]`
- `error: Optional[str]`
- `DocContext`:
- `doc_id: str`
- `dao_id: Optional[str]`
- `user_id: Optional[str]`
- `doc_url: Optional[str]`
- `file_name: Optional[str]`
- `saved_at: Optional[str]`
#### 1.2. DocumentService class
Implement `DocumentService` using `router_client.send_to_router` and `memory_client`:
Methods:
- `async def save_doc_context(session_id, doc_id, doc_url=None, file_name=None, dao_id=None) -> bool`
- Uses `memory_client.upsert_fact` with:
- `fact_key = f"doc_context:{session_id}"`
- `fact_value_json = {"doc_id", "doc_url", "file_name", "dao_id", "saved_at"}`.
- Extract `user_id` from `session_id` (e.g., `telegram:123``user_id="123"`).
- `async def get_doc_context(session_id) -> Optional[DocContext]`
- Uses `memory_client.get_fact(user_id, fact_key)`.
- If `fact_value_json` exists, return `DocContext(**fact_value_json)`.
- `async def parse_document(session_id, doc_url, file_name, dao_id, user_id, output_mode="qa_pairs", metadata=None) -> ParsedResult`
- Builds router request:
- `mode: "doc_parse"`
- `agent: "parser"`
- `metadata`: includes `source` (derived from session_id), `dao_id`, `user_id`, `session_id` and optional metadata.
- `payload`: includes `doc_url`, `file_name`, `output_mode`, `dao_id`, `user_id`.
- Calls `send_to_router`.
- On success:
- Extract `doc_id` from response.
- Call `save_doc_context`.
- Map `qa_pairs`, `markdown`, `chunks` into `ParsedResult`.
- `async def ingest_document(session_id, doc_id=None, doc_url=None, file_name=None, dao_id=None, user_id=None) -> IngestResult`
- If `doc_id` is `None`, load from `get_doc_context`.
- Build router request with `mode: "doc_parse"`, `payload.output_mode="chunks"`, `payload.ingest=True` and `doc_url` / `doc_id`.
- Return `IngestResult` with `ingested_chunks` based on `chunks` length.
- `async def ask_about_document(session_id, question, doc_id=None, dao_id=None, user_id=None) -> QAResult`
- If `doc_id` is `None`, load from `get_doc_context`.
- Build router request with `mode: "rag_query"` and `payload` containing `question`, `dao_id`, `user_id`, `doc_id`.
- Return `QAResult` with `answer` and optional `sources`.
Provide small helper method:
- `_extract_source(session_id: str) -> str` → returns first segment before `:` (e.g. `"telegram"`, `"web"`).
At bottom of the file, export convenience functions:
- `doc_service = DocumentService()`
- Top-level async wrappers:
- `parse_document(...)`, `ingest_document(...)`, `ask_about_document(...)`, `save_doc_context(...)`, `get_doc_context(...)`.
> IMPORTANT: No Telegram-specific logic (emoji, message length, `/ingest` hints) in this file.
---
### 2. Extend MemoryClient: `gateway-bot/memory_client.py`
Add method:
```python
async def get_fact(self, user_id: str, fact_key: str, team_id: Optional[str] = None) -> Optional[Dict[str, Any]]:
"""Get single fact by key"""
```
- Use Memory Service HTTP API, e.g.:
- `GET {base_url}/facts/{fact_key}` with `user_id` and optional `team_id` in query params.
- Return `response.json()` on 200, else `None`.
This method will be used by `doc_service.get_doc_context`.
Do **not** change existing public methods.
---
### 3. HTTP API for web/mobile: `gateway-bot/http_api_doc.py`
Create `gateway-bot/http_api_doc.py` with:
- `APIRouter()` named `router`.
- Import from `services.doc_service`:
- `parse_document`, `ingest_document`, `ask_about_document`, `get_doc_context`, and models.
Endpoints:
1. `POST /api/doc/parse`
Request (JSON body, Pydantic model `ParseDocumentRequest`):
- `session_id: str`
- `doc_url: str`
- `file_name: str`
- `dao_id: str`
- `user_id: str`
- `output_mode: str = "qa_pairs"`
- `metadata: Optional[Dict[str, Any]]`
Behaviour:
- Call `parse_document(...)` from doc_service.
- On failure → `HTTPException(status_code=400, detail=result.error)`.
- On success → JSON with `doc_id`, `qa_pairs` (as list of dict), `markdown`, `chunks_meta`, `raw`.
2. `POST /api/doc/ingest`
Request (`IngestDocumentRequest`):
- `session_id: str`
- `doc_id: Optional[str]`
- `doc_url: Optional[str]`
- `file_name: Optional[str]`
- `dao_id: Optional[str]`
- `user_id: Optional[str]`
Behaviour:
- If `doc_id` is missing, use `get_doc_context(session_id)`.
- Call `ingest_document(...)`.
- Return `doc_id`, `ingested_chunks`, `status`.
3. `POST /api/doc/ask`
Request (`AskDocumentRequest`):
- `session_id: str`
- `question: str`
- `doc_id: Optional[str]`
- `dao_id: Optional[str]`
- `user_id: Optional[str]`
Behaviour:
- If `doc_id` is missing, use `get_doc_context(session_id)`.
- Call `ask_about_document(...)`.
- Return `answer`, `doc_id`, and `sources` (if any).
4. `GET /api/doc/context/{session_id}`
Behaviour:
- Use `get_doc_context(session_id)`.
- If missing → 404.
- Else return `doc_id`, `dao_id`, `user_id`, `doc_url`, `file_name`, `saved_at`.
Optional: `POST /api/doc/parse/upload` stub for future file-upload handling (currently can return 501 with note to use `doc_url`).
---
### 4. Wire API into app: `gateway-bot/app.py`
Update `app.py`:
- Import both routers:
```python
from http_api import router as gateway_router
from http_api_doc import router as doc_router
```
- Include them:
```python
app.include_router(gateway_router, prefix="", tags=["gateway"])
app.include_router(doc_router, prefix="", tags=["docs"])
```
- Update root endpoint `/` to list new endpoints:
- `"POST /api/doc/parse"`
- `"POST /api/doc/ingest"`
- `"POST /api/doc/ask"`
- `"GET /api/doc/context/{session_id}"`
---
### 5. Refactor Telegram handlers: `gateway-bot/http_api.py`
Update `http_api.py` so Telegram uses `doc_service` for PDF/ingest/RAG, keeping existing chat/voice flows.
#### 5.1. Imports and constants
- Add imports:
```python
from services.doc_service import (
parse_document,
ingest_document,
ask_about_document,
get_doc_context,
)
```
- Define Telegram length limits:
```python
TELEGRAM_MAX_MESSAGE_LENGTH = 4096
TELEGRAM_SAFE_LENGTH = 3500
```
#### 5.2. DAARWIZZ `/telegram/webhook`
Inside `telegram_webhook`:
1. **/ingest command**
- Check `text` from message: if starts with `/ingest`:
- `session_id = f"telegram:{chat_id}"`.
- If message also contains a PDF document:
- Use `get_telegram_file_path(file_id)` and correct bot token to build `file_url`.
- `await send_telegram_message(chat_id, "📥 Імпортую документ у RAG...")`.
- Call `ingest_document(session_id, doc_url=file_url, file_name=file_name, dao_id, user_id=f"tg:{user_id}")`.
- Else:
- Call `ingest_document(session_id, dao_id=dao_id, user_id=f"tg:{user_id}")` and rely on stored context.
- Send success/failure message.
2. **PDF detection**
- Check `document = update.message.get("document")`.
- Determine `is_pdf` via `mime_type` and/or `file_name.endswith(".pdf")`.
- If PDF:
- Log file info.
- Get `file_path` via `get_telegram_file_path(file_id)` + correct token → `file_url`.
- Send "📄 Обробляю PDF-документ...".
- `session_id = f"telegram:{chat_id}"`.
- Call `parse_document(session_id, doc_url=file_url, file_name=file_name, dao_id, user_id=f"tg:{user_id}", output_mode="qa_pairs", metadata={"username": username, "chat_id": chat_id})`.
- On success, format:
- Prefer Q&A (`result.qa_pairs`) → `format_qa_response(...)`.
- Else markdown → `format_markdown_response(...)`.
- Else chunks → `format_chunks_response(...)`.
- Append hint: `"\n\n💡 _Використай /ingest для імпорту документа у RAG_"`.
- Send response via `send_telegram_message`.
3. **RAG follow-up questions**
- After computing `text` (from voice or direct text), before regular chat routing:
- `session_id = f"telegram:{chat_id}"`.
- Load `doc_context = await get_doc_context(session_id)`.
- If `doc_context.doc_id` exists and text looks like a question (contains `?` or Ukrainian question words):
- Call `ask_about_document(session_id, question=text, doc_id=doc_context.doc_id, dao_id=dao_id or doc_context.dao_id, user_id=f"tg:{user_id}")`.
- If success, truncate answer to `TELEGRAM_SAFE_LENGTH` and send as Telegram message.
- If RAG fails → fall back to normal chat routing.
4. **Keep voice + normal chat flows**
- Existing STT flow and chat→router logic should remain as fallback for non-PDF / non-ingest / non-RAG messages.
#### 5.3. Helion `/helion/telegram/webhook`
Mirror the same behaviours for Helion handler:
- `/ingest` command support.
- PDF detection and `parse_document` usage.
- RAG follow-up via `ask_about_document`.
- Use `HELION_TELEGRAM_BOT_TOKEN` for file download and message sending.
- Preserve existing chat→router behaviour when doc flow does not apply.
#### 5.4. Formatting helpers
Add helper functions at the bottom of `http_api.py` (Telegram-specific):
- `format_qa_response(qa_pairs: list, max_pairs: int = 5) -> str`
- Adds header, enumerates Q&A pairs, truncates long answers, respects `TELEGRAM_SAFE_LENGTH`.
- `format_markdown_response(markdown: str) -> str`
- Wraps markdown with header; truncates to `TELEGRAM_SAFE_LENGTH` and appends hint about `/ingest` if truncated.
- `format_chunks_response(chunks: list) -> str`
- Shows summary about number of chunks and previews first ~3.
> IMPORTANT: These helpers handle Telegram-specific constraints and SHOULD NOT be moved into `doc_service`.
---
## Acceptance criteria
1. `gateway-bot/services/doc_service.py` exists and provides:
- `parse_document`, `ingest_document`, `ask_about_document`, `save_doc_context`, `get_doc_context`.
- Uses DAGI Router and Memory Service, with `session_id`-based context.
2. `gateway-bot/http_api_doc.py` exists and defines:
- `POST /api/doc/parse`
- `POST /api/doc/ingest`
- `POST /api/doc/ask`
- `GET /api/doc/context/{session_id}`
3. `gateway-bot/app.py`:
- Includes both `http_api.router` and `http_api_doc.router`.
- Root `/` lists new `/api/doc/*` endpoints.
4. `gateway-bot/memory_client.py`:
- Includes `get_fact(...)` and existing methods still work.
- `doc_service` uses `upsert_fact` + `get_fact` for `doc_context:{session_id}`.
5. `gateway-bot/http_api.py`:
- Telegram handlers use `doc_service` for:
- PDF parsing,
- `/ingest` command,
- RAG follow-up questions.
- Continue to support existing voice→STT→chat flow and regular chat routing when doc flow isnt triggered.
6. Web/mobile clients can call `/api/doc/*` to:
- Parse documents via `doc_url`.
- Ingest into RAG.
- Ask questions about the last parsed document for given `session_id`.
---
## How to run this task with Cursor
From repo root (`microdao-daarion`):
```bash
cursor task < docs/cursor/channel_agnostic_doc_flow_task.md
```
Cursor should then:
- Create/modify the files listed above.
- Ensure implementation matches the described architecture and acceptance criteria.