feat: implement RAG Service MVP with PARSER + Memory integration

RAG Service Implementation:
- Create rag-service/ with full structure (config, document_store, embedding, pipelines)
- Document Store: PostgreSQL + pgvector via Haystack
- Embedding: BAAI/bge-m3 (multilingual, 1024 dim)
- Ingest Pipeline: Convert ParsedDocument to Haystack Documents, embed, index
- Query Pipeline: Retrieve documents, generate answers via DAGI Router
- FastAPI endpoints: /ingest, /query, /health

Tests:
- Unit tests for ingest and query pipelines
- E2E test with example parsed JSON
- Test fixtures with real PARSER output example

Router Integration:
- Add mode='rag_query' routing rule in router-config.yml
- Priority 7, uses local_qwen3_8b for RAG queries

Docker:
- Add rag-service to docker-compose.yml
- Configure dependencies (router, city-db)
- Add model cache volume

Documentation:
- Complete README with API examples
- Integration guides for PARSER and Router
This commit is contained in:
Apple
2025-11-16 04:41:53 -08:00
parent d3c701f3ff
commit 9b86f9a694
19 changed files with 1275 additions and 97 deletions

View File

@@ -0,0 +1,206 @@
# RAG Service
Retrieval-Augmented Generation service for MicroDAO. Integrates PARSER + Memory + Vector Search.
## Features
- **Document Ingestion**: Convert ParsedDocument from PARSER service to vector embeddings
- **Query Pipeline**: Retrieve relevant documents and generate answers using LLM
- **Haystack Integration**: Uses Haystack 2.x with PostgreSQL + pgvector
- **Memory Integration**: Combines RAG results with Memory context
## Architecture
```
PARSER → parsed_json → RAG Service → Vector DB (pgvector)
User Query → RAG Service → Retrieve Documents → LLM (DAGI Router) → Answer + Citations
```
## Configuration
### Environment Variables
```bash
# PostgreSQL
PG_DSN=postgresql+psycopg2://postgres:postgres@city-db:5432/daarion_city
# Embedding Model
EMBED_MODEL_NAME=BAAI/bge-m3 # or intfloat/multilingual-e5-base
EMBED_DEVICE=cuda # or cpu, mps
EMBED_DIM=1024 # BAAI/bge-m3 = 1024
# Document Store
RAG_TABLE_NAME=rag_documents
SEARCH_STRATEGY=approximate
# LLM Provider
LLM_PROVIDER=router # router, openai, local
ROUTER_BASE_URL=http://router:9102
```
## API Endpoints
### POST /ingest
Ingest parsed document from PARSER service.
**Request:**
```json
{
"dao_id": "daarion",
"doc_id": "microdao-tokenomics-2025-11",
"parsed_json": { ... },
"user_id": "optional-user-id"
}
```
**Response:**
```json
{
"status": "success",
"doc_count": 15,
"dao_id": "daarion",
"doc_id": "microdao-tokenomics-2025-11"
}
```
### POST /query
Query RAG system for answers.
**Request:**
```json
{
"dao_id": "daarion",
"question": "Поясни токеноміку microDAO і роль стейкінгу",
"top_k": 5,
"user_id": "optional-user-id"
}
```
**Response:**
```json
{
"answer": "MicroDAO використовує токен μGOV...",
"citations": [
{
"doc_id": "microdao-tokenomics-2025-11",
"page": 1,
"section": "Токеноміка MicroDAO",
"excerpt": "MicroDAO використовує токен μGOV..."
}
],
"documents": [...]
}
```
### GET /health
Health check endpoint.
## Usage
### 1. Ingest Document
After parsing document with PARSER service:
```bash
curl -X POST http://localhost:9500/ingest \
-H "Content-Type: application/json" \
-d '{
"dao_id": "daarion",
"doc_id": "microdao-tokenomics-2025-11",
"parsed_json": { ... }
}'
```
### 2. Query RAG
```bash
curl -X POST http://localhost:9500/query \
-H "Content-Type: application/json" \
-d '{
"dao_id": "daarion",
"question": "Поясни токеноміку microDAO"
}'
```
## Integration with PARSER
After parsing document:
```python
# In parser-service
parsed_doc = parse_document_from_images(images, output_mode="raw_json")
# Send to RAG Service
import httpx
async with httpx.AsyncClient() as client:
response = await client.post(
"http://rag-service:9500/ingest",
json={
"dao_id": "daarion",
"doc_id": parsed_doc.doc_id,
"parsed_json": parsed_doc.model_dump(mode="json")
}
)
```
## Integration with Router
Router handles `mode="rag_query"`:
```python
# In Router
if req.mode == "rag_query":
# Call RAG Service
rag_response = await rag_client.query(
dao_id=req.dao_id,
question=req.payload.get("question")
)
# Combine with Memory context
memory_context = await memory_client.get_context(...)
# Build prompt with RAG + Memory
prompt = build_prompt_with_rag_and_memory(
question=req.payload.get("question"),
rag_documents=rag_response["documents"],
memory_context=memory_context
)
# Call LLM
answer = await llm_provider.generate(prompt)
```
## Development
### Local Setup
```bash
# Install dependencies
pip install -r requirements.txt
# Set environment variables
export PG_DSN="postgresql+psycopg2://postgres:postgres@localhost:5432/daarion_city"
export EMBED_MODEL_NAME="BAAI/bge-m3"
export EMBED_DEVICE="cpu"
# Run service
uvicorn app.main:app --host 0.0.0.0 --port 9500 --reload
```
### Tests
```bash
pytest tests/
```
## Dependencies
- **Haystack 2.x**: Document store, embedding, retrieval
- **sentence-transformers**: Embedding models
- **psycopg2**: PostgreSQL connection
- **FastAPI**: API framework