feat: add Vision Encoder service + Vision RAG implementation

- Vision Encoder Service (OpenCLIP ViT-L/14, GPU-accelerated) - FastAPI app with text/image embedding endpoints (768-dim) - Docker support with NVIDIA GPU runtime - Port 8001, health checks, model info API - Qdrant Vector Database integration - Port 6333/6334 (HTTP/gRPC) - Image embeddings storage (768-dim, Cosine distance) - Auto collection creation - Vision RAG implementation - VisionEncoderClient (Python client for API) - Image Search module (text-to-image, image-to-image) - Vision RAG routing in DAGI Router (mode: image_search) - VisionEncoderProvider integration - Documentation (5000+ lines) - SYSTEM-INVENTORY.md - Complete system inventory - VISION-ENCODER-STATUS.md - Service status - VISION-RAG-IMPLEMENTATION.md - Implementation details - vision_encoder_deployment_task.md - Deployment checklist - services/vision-encoder/README.md - Deployment guide - Updated WARP.md, INFRASTRUCTURE.md, Jupyter Notebook - Testing - test-vision-encoder.sh - Smoke tests (6 tests) - Unit tests for client, image search, routing - Services: 17 total (added Vision Encoder + Qdrant) - AI Models: 3 (qwen3:8b, OpenCLIP ViT-L/14, BAAI/bge-m3) - GPU Services: 2 (Vision Encoder, Ollama) - VRAM Usage: ~10 GB (concurrent) Status: Production Ready ✅
2025-11-17 05:24:36 -08:00
parent b2b51f08fb
commit 4601c6fca8
55 changed files with 13205 additions and 3 deletions
--- a/docs/cursor/rag_ingestion_worker_task.md
+++ b/docs/cursor/rag_ingestion_worker_task.md
@@ -0,0 +1,260 @@
+# Task: RAG ingestion worker (events → Milvus + Neo4j)
+
+## Goal
+
+Design and scaffold a **RAG ingestion worker** that:
+
+- Сonsumes domain events (messages, docs, files, RWA updates) from the existing event stream.
+- Transforms them into normalized chunks/documents.
+- Indexes them into **Milvus** (vector store) and **Neo4j** (graph store).
+- Works **idempotently** and supports `reindex(team_id)`.
+
+This worker complements the `rag-gateway` service (see `docs/cursor/rag_gateway_task.md`) by keeping its underlying stores up-to-date.
+
+> IMPORTANT: This task is about architecture, data flow and scaffolding. Concrete model choices and full schemas can be refined later.
+
+---
+
+## Context
+
+- Project root: `microdao-daarion/`.
+- Planned/implemented RAG layer: see `docs/cursor/rag_gateway_task.md`.
+- Existing docs:
+  - `docs/cursor/42_nats_event_streams_and_event_catalog.md` – event stream & catalog.
+  - `docs/cursor/34_internal_services_architecture.md` – internal services & topology.
+
+We assume there is (or will be):
+
+- An event bus (likely NATS) with domain events such as:
+  - `message.created`
+  - `doc.upsert`
+  - `file.uploaded`
+  - `rwa.energy.update`, `rwa.food.update`, etc.
+- A Milvus cluster instance.
+- A Neo4j instance.
+
+The ingestion worker must **not** be called directly by agents. It is a back-office service that feeds RAG stores for the `rag-gateway`.
+
+---
+
+## High-level design
+
+### 1. Service placement & structure
+
+Create a new service (or extend RAG-gateway repo structure) under, for example:
+
+- `services/rag-ingest-worker/`
+
+Suggested files:
+
+- `main.py` — entrypoint (CLI or long-running process).
+- `config.py` — environment/config loader (event bus URL, Milvus/Neo4j URLs, batch sizes, etc.).
+- `events/consumer.py` — NATS (or other) consumer logic.
+- `pipeline/normalization.py` — turn events into normalized documents/chunks.
+- `pipeline/embedding.py` — embedding model client/wrapper.
+- `pipeline/index_milvus.py` — Milvus upsert logic.
+- `pipeline/index_neo4j.py` — Neo4j graph updates.
+- `api.py` — optional HTTP API for:
+  - `POST /ingest/one` – ingest single payload for debugging.
+  - `POST /ingest/reindex/{team_id}` – trigger reindex job.
+  - `GET /health` – health check.
+
+### 2. Event sources
+
+The worker should subscribe to a **small set of core event types** (names to be aligned with the actual Event Catalog):
+
+- `message.created` — messages in chats/channels (Telegram, internal UI, etc.).
+- `doc.upsert` — wiki/docs/specs updates.
+- `file.uploaded` — files (PDF, images) that have parsed text.
+- `rwa.*` — events related to energy/food/water assets (optional, for later).
+
+Implementation details:
+
+- Use NATS (or another broker) subscription patterns from `docs/cursor/42_nats_event_streams_and_event_catalog.md`.
+- Each event should carry at least:
+  - `event_type`
+  - `team_id` / `dao_id`
+  - `user_id`
+  - `channel_id` / `project_id` (if applicable)
+  - `payload` with text/content and metadata.
+
+---
+
+## Normalized document/chunk model
+
+Define a common internal model for what is sent to Milvus/Neo4j, e.g. `IngestChunk`:
+
+Fields (minimum):
+
+- `chunk_id` — deterministic ID (e.g. hash of (team_id, source_type, source_id, chunk_index)).
+- `team_id` / `dao_id`.
+- `project_id` (optional).
+- `channel_id` (optional).
+- `agent_id` (who generated it, if any).
+- `source_type` — `"message" | "doc" | "file" | "wiki" | "rwa" | ...`.
+- `source_id` — e.g. message ID, doc ID, file ID.
+- `text` — the chunk content.
+- `tags` — list of tags (topic, domain, etc.).
+- `visibility` — `"public" | "confidential"`.
+- `created_at` — timestamp.
+
+Responsibilities:
+
+- `pipeline/normalization.py`:
+  - For each event type, map event payload → one or more `IngestChunk` objects.
+  - Handle splitting of long texts into smaller chunks if needed.
+
+---
+
+## Embedding & Milvus indexing
+
+### 1. Embedding
+
+- Create an embedding component (`pipeline/embedding.py`) that:
+  - Accepts `IngestChunk` objects.
+  - Supports batch processing.
+  - Uses either:
+    - Existing LLM proxy/embedding service (preferred), or
+    - Direct model (e.g. local `bge-m3`, `gte-large`, etc.).
+
+- Each chunk after embedding should have vector + metadata per schema in `rag_gateway_task`.
+
+### 2. Milvus indexing
+
+- `pipeline/index_milvus.py` should:
+  - Upsert chunks into Milvus.
+  - Ensure **idempotency** using `chunk_id` as primary key.
+  - Store metadata:
+    - `team_id`, `project_id`, `channel_id`, `agent_id`,
+    - `source_type`, `source_id`,
+    - `visibility`, `tags`, `created_at`,
+    - `embed_model` version.
+
+- Consider using one Milvus collection with a partition key (`team_id`), or per-DAO collections — but keep code flexible.
+
+---
+
+## Neo4j graph updates
+
+`pipeline/index_neo4j.py` should:
+
+- For events that carry structural information (e.g. project uses resource, doc mentions topic):
+  - Create or update nodes: `User`, `MicroDAO`, `Project`, `Channel`, `Topic`, `Resource`, `File`, `RWAObject`, `Doc`.
+  - Create relationships such as:
+    - `(:User)-[:MEMBER_OF]->(:MicroDAO)`
+    - `(:Agent)-[:SERVES]->(:MicroDAO|:Project)`
+    - `(:Doc)-[:MENTIONS]->(:Topic)`
+    - `(:Project)-[:USES]->(:Resource)`
+
+- All nodes/edges must include:
+  - `team_id` / `dao_id`
+  - `visibility` when it matters
+
+- Operations should be **upserts** (MERGE) to avoid duplicates.
+
+---
+
+## Idempotency & reindex
+
+### 1. Idempotent semantics
+
+- Use deterministic `chunk_id` for Milvus records.
+- Use Neo4j `MERGE` for nodes/edges based on natural keys (e.g. `(team_id, source_type, source_id, chunk_index)`).
+- Replaying the same events should not corrupt or duplicate data.
+
+### 2. Reindex API
+
+- Provide a simple HTTP or CLI interface to:
+
+  - `POST /ingest/reindex/{team_id}` — schedule or start reindex for a team/DAO.
+
+- Reindex strategy:
+
+  - Read documents/messages from source-of-truth (DB or event replay).
+  - Rebuild chunks and embeddings.
+  - Upsert into Milvus & Neo4j (idempotently).
+
+Implementation details (can be left as TODOs if missing backends):
+
+- If there is no easy historic source yet, stub the reindex endpoint with clear TODO and logging.
+
+---
+
+## Monitoring & logging
+
+Add basic observability:
+
+- Structured logs for:
+  - Each event type ingested.
+  - Number of chunks produced.
+  - Latency for embedding and indexing.
+- (Optional) Metrics counters/gauges:
+  - `ingest_events_total`
+  - `ingest_chunks_total`
+  - `ingest_errors_total`
+
+---
+
+## Files to create/modify (suggested)
+
+> Adjust exact paths if needed.
+
+- `services/rag-ingest-worker/main.py`
+  - Parse config, connect to event bus, start consumers.
+
+- `services/rag-ingest-worker/config.py`
+  - Environment variables: `EVENT_BUS_URL`, `MILVUS_URL`, `NEO4J_URL`, `EMBEDDING_SERVICE_URL`, etc.
+
+- `services/rag-ingest-worker/events/consumer.py`
+  - NATS (or chosen bus) subscription logic.
+
+- `services/rag-ingest-worker/pipeline/normalization.py`
+  - Functions `normalize_message_created(event)`, `normalize_doc_upsert(event)`, `normalize_file_uploaded(event)`.
+
+- `services/rag-ingest-worker/pipeline/embedding.py`
+  - `embed_chunks(chunks: List[IngestChunk]) -> List[VectorChunk]`.
+
+- `services/rag-ingest-worker/pipeline/index_milvus.py`
+  - `upsert_chunks_to_milvus(chunks: List[VectorChunk])`.
+
+- `services/rag-ingest-worker/pipeline/index_neo4j.py`
+  - `update_graph_for_event(event, chunks: List[IngestChunk])`.
+
+- Optional: `services/rag-ingest-worker/api.py`
+  - FastAPI app with:
+    - `GET /health`
+    - `POST /ingest/one`
+    - `POST /ingest/reindex/{team_id}`
+
+- Integration docs:
+  - Reference `docs/cursor/rag_gateway_task.md` and `docs/cursor/42_nats_event_streams_and_event_catalog.md` where appropriate.
+
+---
+
+## Acceptance criteria
+
+1. A new `rag-ingest-worker` (or similarly named) module/service exists under `services/` with:
+   - Clear directory structure (`events/`, `pipeline/`, `config.py`, `main.py`).
+   - Stubs or initial implementations for consuming events and indexing to Milvus/Neo4j.
+
+2. A normalized internal model (`IngestChunk` or equivalent) is defined and used across pipelines.
+
+3. Milvus indexing code:
+   - Uses idempotent upserts keyed by `chunk_id`.
+   - Stores metadata compatible with the RAG-gateway schema.
+
+4. Neo4j update code:
+   - Uses MERGE for nodes/relationships.
+   - Encodes `team_id`/`dao_id` and privacy where relevant.
+
+5. Idempotency strategy and `reindex(team_id)` path are present in code (even if reindex is initially a stub with TODO).
+
+6. Basic logging is present for ingestion operations.
+
+7. This file (`docs/cursor/rag_ingestion_worker_task.md`) can be executed by Cursor as:
+
+   ```bash
+   cursor task < docs/cursor/rag_ingestion_worker_task.md
+   ```
+
+   and Cursor will use it as the single source of truth for implementing/refining the ingestion worker.