feat: add Vision Encoder service + Vision RAG implementation

- Vision Encoder Service (OpenCLIP ViT-L/14, GPU-accelerated)
  - FastAPI app with text/image embedding endpoints (768-dim)
  - Docker support with NVIDIA GPU runtime
  - Port 8001, health checks, model info API

- Qdrant Vector Database integration
  - Port 6333/6334 (HTTP/gRPC)
  - Image embeddings storage (768-dim, Cosine distance)
  - Auto collection creation

- Vision RAG implementation
  - VisionEncoderClient (Python client for API)
  - Image Search module (text-to-image, image-to-image)
  - Vision RAG routing in DAGI Router (mode: image_search)
  - VisionEncoderProvider integration

- Documentation (5000+ lines)
  - SYSTEM-INVENTORY.md - Complete system inventory
  - VISION-ENCODER-STATUS.md - Service status
  - VISION-RAG-IMPLEMENTATION.md - Implementation details
  - vision_encoder_deployment_task.md - Deployment checklist
  - services/vision-encoder/README.md - Deployment guide
  - Updated WARP.md, INFRASTRUCTURE.md, Jupyter Notebook

- Testing
  - test-vision-encoder.sh - Smoke tests (6 tests)
  - Unit tests for client, image search, routing

- Services: 17 total (added Vision Encoder + Qdrant)
- AI Models: 3 (qwen3:8b, OpenCLIP ViT-L/14, BAAI/bge-m3)
- GPU Services: 2 (Vision Encoder, Ollama)
- VRAM Usage: ~10 GB (concurrent)

Status: Production Ready 
This commit is contained in:
Apple
2025-11-17 05:24:36 -08:00
parent b2b51f08fb
commit 4601c6fca8
55 changed files with 13205 additions and 3 deletions

View File

@@ -0,0 +1,260 @@
# Task: RAG ingestion worker (events → Milvus + Neo4j)
## Goal
Design and scaffold a **RAG ingestion worker** that:
- Сonsumes domain events (messages, docs, files, RWA updates) from the existing event stream.
- Transforms them into normalized chunks/documents.
- Indexes them into **Milvus** (vector store) and **Neo4j** (graph store).
- Works **idempotently** and supports `reindex(team_id)`.
This worker complements the `rag-gateway` service (see `docs/cursor/rag_gateway_task.md`) by keeping its underlying stores up-to-date.
> IMPORTANT: This task is about architecture, data flow and scaffolding. Concrete model choices and full schemas can be refined later.
---
## Context
- Project root: `microdao-daarion/`.
- Planned/implemented RAG layer: see `docs/cursor/rag_gateway_task.md`.
- Existing docs:
- `docs/cursor/42_nats_event_streams_and_event_catalog.md` event stream & catalog.
- `docs/cursor/34_internal_services_architecture.md` internal services & topology.
We assume there is (or will be):
- An event bus (likely NATS) with domain events such as:
- `message.created`
- `doc.upsert`
- `file.uploaded`
- `rwa.energy.update`, `rwa.food.update`, etc.
- A Milvus cluster instance.
- A Neo4j instance.
The ingestion worker must **not** be called directly by agents. It is a back-office service that feeds RAG stores for the `rag-gateway`.
---
## High-level design
### 1. Service placement & structure
Create a new service (or extend RAG-gateway repo structure) under, for example:
- `services/rag-ingest-worker/`
Suggested files:
- `main.py` — entrypoint (CLI or long-running process).
- `config.py` — environment/config loader (event bus URL, Milvus/Neo4j URLs, batch sizes, etc.).
- `events/consumer.py` — NATS (or other) consumer logic.
- `pipeline/normalization.py` — turn events into normalized documents/chunks.
- `pipeline/embedding.py` — embedding model client/wrapper.
- `pipeline/index_milvus.py` — Milvus upsert logic.
- `pipeline/index_neo4j.py` — Neo4j graph updates.
- `api.py` — optional HTTP API for:
- `POST /ingest/one` ingest single payload for debugging.
- `POST /ingest/reindex/{team_id}` trigger reindex job.
- `GET /health` health check.
### 2. Event sources
The worker should subscribe to a **small set of core event types** (names to be aligned with the actual Event Catalog):
- `message.created` — messages in chats/channels (Telegram, internal UI, etc.).
- `doc.upsert` — wiki/docs/specs updates.
- `file.uploaded` — files (PDF, images) that have parsed text.
- `rwa.*` — events related to energy/food/water assets (optional, for later).
Implementation details:
- Use NATS (or another broker) subscription patterns from `docs/cursor/42_nats_event_streams_and_event_catalog.md`.
- Each event should carry at least:
- `event_type`
- `team_id` / `dao_id`
- `user_id`
- `channel_id` / `project_id` (if applicable)
- `payload` with text/content and metadata.
---
## Normalized document/chunk model
Define a common internal model for what is sent to Milvus/Neo4j, e.g. `IngestChunk`:
Fields (minimum):
- `chunk_id` — deterministic ID (e.g. hash of (team_id, source_type, source_id, chunk_index)).
- `team_id` / `dao_id`.
- `project_id` (optional).
- `channel_id` (optional).
- `agent_id` (who generated it, if any).
- `source_type``"message" | "doc" | "file" | "wiki" | "rwa" | ...`.
- `source_id` — e.g. message ID, doc ID, file ID.
- `text` — the chunk content.
- `tags` — list of tags (topic, domain, etc.).
- `visibility``"public" | "confidential"`.
- `created_at` — timestamp.
Responsibilities:
- `pipeline/normalization.py`:
- For each event type, map event payload → one or more `IngestChunk` objects.
- Handle splitting of long texts into smaller chunks if needed.
---
## Embedding & Milvus indexing
### 1. Embedding
- Create an embedding component (`pipeline/embedding.py`) that:
- Accepts `IngestChunk` objects.
- Supports batch processing.
- Uses either:
- Existing LLM proxy/embedding service (preferred), or
- Direct model (e.g. local `bge-m3`, `gte-large`, etc.).
- Each chunk after embedding should have vector + metadata per schema in `rag_gateway_task`.
### 2. Milvus indexing
- `pipeline/index_milvus.py` should:
- Upsert chunks into Milvus.
- Ensure **idempotency** using `chunk_id` as primary key.
- Store metadata:
- `team_id`, `project_id`, `channel_id`, `agent_id`,
- `source_type`, `source_id`,
- `visibility`, `tags`, `created_at`,
- `embed_model` version.
- Consider using one Milvus collection with a partition key (`team_id`), or per-DAO collections — but keep code flexible.
---
## Neo4j graph updates
`pipeline/index_neo4j.py` should:
- For events that carry structural information (e.g. project uses resource, doc mentions topic):
- Create or update nodes: `User`, `MicroDAO`, `Project`, `Channel`, `Topic`, `Resource`, `File`, `RWAObject`, `Doc`.
- Create relationships such as:
- `(:User)-[:MEMBER_OF]->(:MicroDAO)`
- `(:Agent)-[:SERVES]->(:MicroDAO|:Project)`
- `(:Doc)-[:MENTIONS]->(:Topic)`
- `(:Project)-[:USES]->(:Resource)`
- All nodes/edges must include:
- `team_id` / `dao_id`
- `visibility` when it matters
- Operations should be **upserts** (MERGE) to avoid duplicates.
---
## Idempotency & reindex
### 1. Idempotent semantics
- Use deterministic `chunk_id` for Milvus records.
- Use Neo4j `MERGE` for nodes/edges based on natural keys (e.g. `(team_id, source_type, source_id, chunk_index)`).
- Replaying the same events should not corrupt or duplicate data.
### 2. Reindex API
- Provide a simple HTTP or CLI interface to:
- `POST /ingest/reindex/{team_id}` — schedule or start reindex for a team/DAO.
- Reindex strategy:
- Read documents/messages from source-of-truth (DB or event replay).
- Rebuild chunks and embeddings.
- Upsert into Milvus & Neo4j (idempotently).
Implementation details (can be left as TODOs if missing backends):
- If there is no easy historic source yet, stub the reindex endpoint with clear TODO and logging.
---
## Monitoring & logging
Add basic observability:
- Structured logs for:
- Each event type ingested.
- Number of chunks produced.
- Latency for embedding and indexing.
- (Optional) Metrics counters/gauges:
- `ingest_events_total`
- `ingest_chunks_total`
- `ingest_errors_total`
---
## Files to create/modify (suggested)
> Adjust exact paths if needed.
- `services/rag-ingest-worker/main.py`
- Parse config, connect to event bus, start consumers.
- `services/rag-ingest-worker/config.py`
- Environment variables: `EVENT_BUS_URL`, `MILVUS_URL`, `NEO4J_URL`, `EMBEDDING_SERVICE_URL`, etc.
- `services/rag-ingest-worker/events/consumer.py`
- NATS (or chosen bus) subscription logic.
- `services/rag-ingest-worker/pipeline/normalization.py`
- Functions `normalize_message_created(event)`, `normalize_doc_upsert(event)`, `normalize_file_uploaded(event)`.
- `services/rag-ingest-worker/pipeline/embedding.py`
- `embed_chunks(chunks: List[IngestChunk]) -> List[VectorChunk]`.
- `services/rag-ingest-worker/pipeline/index_milvus.py`
- `upsert_chunks_to_milvus(chunks: List[VectorChunk])`.
- `services/rag-ingest-worker/pipeline/index_neo4j.py`
- `update_graph_for_event(event, chunks: List[IngestChunk])`.
- Optional: `services/rag-ingest-worker/api.py`
- FastAPI app with:
- `GET /health`
- `POST /ingest/one`
- `POST /ingest/reindex/{team_id}`
- Integration docs:
- Reference `docs/cursor/rag_gateway_task.md` and `docs/cursor/42_nats_event_streams_and_event_catalog.md` where appropriate.
---
## Acceptance criteria
1. A new `rag-ingest-worker` (or similarly named) module/service exists under `services/` with:
- Clear directory structure (`events/`, `pipeline/`, `config.py`, `main.py`).
- Stubs or initial implementations for consuming events and indexing to Milvus/Neo4j.
2. A normalized internal model (`IngestChunk` or equivalent) is defined and used across pipelines.
3. Milvus indexing code:
- Uses idempotent upserts keyed by `chunk_id`.
- Stores metadata compatible with the RAG-gateway schema.
4. Neo4j update code:
- Uses MERGE for nodes/relationships.
- Encodes `team_id`/`dao_id` and privacy where relevant.
5. Idempotency strategy and `reindex(team_id)` path are present in code (even if reindex is initially a stub with TODO).
6. Basic logging is present for ingestion operations.
7. This file (`docs/cursor/rag_ingestion_worker_task.md`) can be executed by Cursor as:
```bash
cursor task < docs/cursor/rag_ingestion_worker_task.md
```
and Cursor will use it as the single source of truth for implementing/refining the ingestion worker.