- Vision Encoder Service (OpenCLIP ViT-L/14, GPU-accelerated)
- FastAPI app with text/image embedding endpoints (768-dim)
- Docker support with NVIDIA GPU runtime
- Port 8001, health checks, model info API
- Qdrant Vector Database integration
- Port 6333/6334 (HTTP/gRPC)
- Image embeddings storage (768-dim, Cosine distance)
- Auto collection creation
- Vision RAG implementation
- VisionEncoderClient (Python client for API)
- Image Search module (text-to-image, image-to-image)
- Vision RAG routing in DAGI Router (mode: image_search)
- VisionEncoderProvider integration
- Documentation (5000+ lines)
- SYSTEM-INVENTORY.md - Complete system inventory
- VISION-ENCODER-STATUS.md - Service status
- VISION-RAG-IMPLEMENTATION.md - Implementation details
- vision_encoder_deployment_task.md - Deployment checklist
- services/vision-encoder/README.md - Deployment guide
- Updated WARP.md, INFRASTRUCTURE.md, Jupyter Notebook
- Testing
- test-vision-encoder.sh - Smoke tests (6 tests)
- Unit tests for client, image search, routing
- Services: 17 total (added Vision Encoder + Qdrant)
- AI Models: 3 (qwen3:8b, OpenCLIP ViT-L/14, BAAI/bge-m3)
- GPU Services: 2 (Vision Encoder, Ollama)
- VRAM Usage: ~10 GB (concurrent)
Status: Production Ready ✅
11 KiB
Task: Unified RAG-Gateway service (Milvus + Neo4j) for all agents
Goal
Design and implement a single RAG-gateway service that sits between agents and storage backends (Milvus, Neo4j, etc.), so that:
- Agents never talk directly to Milvus or Neo4j.
- All retrieval, graph queries and hybrid RAG behavior go through one service with a clear API.
- Security, multi-tenancy, logging, and optimization are centralized.
This task is about architecture and API first (code layout, endpoints, data contracts). A later task can cover concrete implementation details if needed.
This spec is intentionally high-level but should be detailed enough for Cursor to scaffold the service, HTTP API, and integration points with DAGI Router.
Context
- Project root:
microdao-daarion/. - There are (or will be) multiple agents:
- DAARWIZZ (system orchestrator)
- Helion (Energy Union)
- Team/Project/Messenger/Co-Memory agents, etc.
- Agents already have access to:
- DAGI Router (LLM routing, tools, orchestrator).
- Memory service (short/long-term chat memory).
- Parser-service (OCR and document parsing).
We now want a RAG layer that can:
- Perform semantic document search across all DAO documents / messages / files.
- Use a vector DB (Milvus) and graph DB (Neo4j) together.
- Provide a clean tool-like API to agents.
The RAG layer should be exposed as a standalone service:
- Working name:
rag-gatewayorknowledge-service. - Internally can use Haystack (or similar) for pipelines.
High-level architecture
1. RAG-Gateway service
Create a new service (later we can place it under services/rag-gateway/), with HTTP API, which will:
- Accept tool-style requests from DAGI Router / agents.
- Internally talk to:
- Milvus (vector search, embeddings).
- Neo4j (graph queries, traversals).
- Return structured JSON for agents to consume.
Core API endpoints (first iteration):
POST /rag/search_docs— semantic/hybrid document search.POST /rag/enrich_answer— enrich an existing answer with sources.POST /graph/query— run a graph query (Cypher or intent-based).POST /graph/explain_path— return graph-based explanation / path between entities.
Agents will see these as tools (e.g. rag.search_docs, graph.query_context) configured in router config.
2. Haystack as internal orchestrator
Within the RAG-gateway, use Haystack components (or analogous) to organize:
MilvusDocumentStoreas the main vector store.- Retrievers:
- Dense retriever over Milvus.
- Optional BM25/keyword retriever (for hybrid search).
- Pipelines:
indexing_pipeline— ingest DAO documents/messages/files into Milvus.query_pipeline— answer agent queries using retrieved documents.graph_rag_pipeline— combine Neo4j graph queries with Milvus retrieval.
The key idea: agents never talk to Haystack directly, only to RAG-gateway HTTP API.
Data model & schema
1. Milvus document schema
Define a standard metadata schema for all documents/chunks stored in Milvus. Required fields:
team_id/dao_id— which DAO / team this data belongs to.project_id— optional project-level grouping.channel_id— optional chat/channel ID (Telegram, internal channel, etc.).agent_id— which agent produced/owns this piece.visibility— one of"public" | "confidential".doc_type— one of"message" | "doc" | "file" | "wiki" | "rwa" | "transaction"(extensible).tags— list of tags (topics, domains, etc.).created_at— timestamp.
These should be part of Milvus metadata, so that RAG-gateway can apply filters (by DAO, project, visibility, etc.).
2. Neo4j graph schema
Design a minimal default graph model with node labels:
User,Agent,MicroDAO,Project,ChannelTopic,Resource,File,RWAObject(e.g. energy asset, food batch, water object).
Key relationships (examples):
(:User)-[:MEMBER_OF]->(:MicroDAO)(:Agent)-[:SERVES]->(:MicroDAO|:Project)(:Doc)-[:MENTIONS]->(:Topic)(:Project)-[:USES]->(:Resource)
Every node/relationship should also carry:
team_id/dao_idvisibilityor similar privacy flag
This allows RAG-gateway to enforce access control at query time.
RAG tools API for agents
Define 2–3 canonical tools that DAGI Router can call. These map to RAG-gateway endpoints.
1. rag.search_docs
Main tool for most knowledge queries.
Request JSON example:
{
"agent_id": "ag_daarwizz",
"team_id": "dao_greenfood",
"query": "які проєкти у нас вже використовують Milvus?",
"top_k": 5,
"filters": {
"project_id": "prj_x",
"doc_type": ["doc", "wiki"],
"visibility": "public"
}
}
Response JSON example:
{
"matches": [
{
"score": 0.82,
"title": "Spec microdao RAG stack",
"snippet": "...",
"source_ref": {
"type": "doc",
"id": "doc_123",
"url": "https://...",
"team_id": "dao_greenfood",
"doc_type": "doc"
}
}
]
}
2. graph.query_context
For relationship/structural questions ("хто з ким повʼязаний", "які проєкти використовують X" etc.).
Two options (can support both):
-
Low-level Cypher:
{ "team_id": "dao_energy", "cypher": "MATCH (p:Project)-[:USES]->(r:Resource {name:$name}) RETURN p LIMIT 10", "params": {"name": "Milvus"} } -
High-level intent:
{ "team_id": "dao_energy", "intent": "FIND_PROJECTS_BY_TECH", "args": {"tech": "Milvus"} }
RAG-gateway then maps intent → Cypher internally.
3. rag.enrich_answer
Given a draft answer from an agent, RAG-gateway retrieves supporting documents and returns enriched answer + citations.
Request example:
{
"team_id": "dao_greenfood",
"question": "Поясни коротко архітектуру RAG шару в нашому місті.",
"draft_answer": "Архітектура складається з ...",
"max_docs": 3
}
Response example:
{
"enriched_answer": "Архітектура складається з ... (з врахуванням джерел)",
"sources": [
{"id": "doc_1", "title": "RAG spec", "url": "https://..."},
{"id": "doc_2", "title": "Milvus setup", "url": "https://..."}
]
}
Multi-tenancy & security
Add a small authorization layer inside RAG-gateway:
- Each request includes:
user_id,team_id(DAO), optionalroles.mode/visibility(e.g."public"or"confidential").
- Before querying Milvus/Neo4j, RAG-gateway applies filters:
team_id = ...visibilitywithin allowed scope.- Optional role-based constraints (Owner/Guardian/Member) affecting what doc_types can be seen.
Implementation hints:
- Start with a simple
AccessContextobject built from request, used by all pipelines. - Later integrate with existing PDP/RBAC if available.
Ingestion & pipelines
Define an ingestion plan and API.
1. Ingest service / worker
Create a separate ingestion component (can be part of RAG-gateway or standalone worker) that:
- Listens to events like:
message.createddoc.upsertfile.uploaded
- For each event:
- Builds text chunks.
- Computes embeddings.
- Writes chunks into Milvus with proper metadata.
- Updates Neo4j graph (nodes/edges) where appropriate.
Requirements:
- Pipelines must be idempotent — re-indexing same document does not break anything.
- Create an API / job for
reindex(team_id)to reindex a full DAO if needed. - Store embedding model version in metadata (e.g.
embed_model: "bge-m3@v1") to ease future migrations.
2. Event contracts
Align ingestion with the existing Event Catalog (if present in docs/cursor):
- Document which event types lead to RAG ingestion.
- For each event, define mapping → Milvus doc, Neo4j nodes/edges.
Optimization for agents
Add support for:
-
Semantic cache per agent
- Cache
query → RAG-resultfor N minutes per (agent_id,team_id). - Useful for frequently repeated queries.
- Cache
-
RAG behavior profiles per agent
- In agent config (probably in router config), define:
rag_mode: off | light | strictmax_context_tokensmax_docs_per_query
- RAG-gateway can read these via metadata from Router, or Router can decide when to call RAG at all.
- In agent config (probably in router config), define:
Files to create/modify (suggested)
NOTE: This is a suggestion; adjust exact paths/names to fit the existing project structure.
-
New service directory:
services/rag-gateway/:main.py— FastAPI (or similar) entrypoint.api.py— defines/rag/search_docs,/rag/enrich_answer,/graph/query,/graph/explain_path.core/pipelines.py— Haystack pipelines (indexing, query, graph-rag).core/schema.py— Pydantic models for request/response, data schema.core/access.py— access control context + checks.core/backends/milvus_client.py— wrapper for Milvus.core/backends/neo4j_client.py— wrapper for Neo4j.
-
Integration with DAGI Router:
- Update
router-config.ymlto define RAG tools:rag.search_docsgraph.query_contextrag.enrich_answer
- Configure providers for RAG-gateway base URL.
- Update
-
Docs:
docs/cursor/rag_gateway_api_spec.md— optional detailed API spec for RAG tools.
Acceptance criteria
-
Service skeleton
- A new RAG-gateway service exists under
services/with:- A FastAPI (or similar) app.
- Endpoints:
POST /rag/search_docsPOST /rag/enrich_answerPOST /graph/queryPOST /graph/explain_path
- Pydantic models for requests/responses.
- A new RAG-gateway service exists under
-
Data contracts
- Milvus document metadata schema is defined (and used in code).
- Neo4j node/edge labels and key relationships are documented and referenced in code.
-
Security & multi-tenancy
- All RAG/graph endpoints accept
user_id,team_id, and enforce at least basic filtering byteam_idandvisibility.
- All RAG/graph endpoints accept
-
Agent tool contracts
- JSON contracts for tools
rag.search_docs,graph.query_context, andrag.enrich_answerare documented and used by RAG-gateway. - DAGI Router integration is sketched (even if not fully wired): provider entry + basic routing rule examples.
- JSON contracts for tools
-
Ingestion design
- Ingestion pipeline is outlined in code (or stubs) with clear TODOs:
- where to hook event consumption,
- how to map events to Milvus/Neo4j.
- Idempotency and
reindex(team_id)strategy described in code/docs.
- Ingestion pipeline is outlined in code (or stubs) with clear TODOs:
-
Documentation
- This file (
docs/cursor/rag_gateway_task.md) plus, optionally, a more detailed API spec file for RAG-gateway.
- This file (
How to run this task with Cursor
From repo root (microdao-daarion):
cursor task < docs/cursor/rag_gateway_task.md
Cursor should then:
- Scaffold the RAG-gateway service structure.
- Implement request/response models and basic endpoints.
- Sketch out Milvus/Neo4j client wrappers and pipelines.
- Optionally, add TODOs where deeper implementation is needed.