feat: implement RAG Service MVP with PARSER + Memory integration

RAG Service Implementation:
- Create rag-service/ with full structure (config, document_store, embedding, pipelines)
- Document Store: PostgreSQL + pgvector via Haystack
- Embedding: BAAI/bge-m3 (multilingual, 1024 dim)
- Ingest Pipeline: Convert ParsedDocument to Haystack Documents, embed, index
- Query Pipeline: Retrieve documents, generate answers via DAGI Router
- FastAPI endpoints: /ingest, /query, /health

Tests:
- Unit tests for ingest and query pipelines
- E2E test with example parsed JSON
- Test fixtures with real PARSER output example

Router Integration:
- Add mode='rag_query' routing rule in router-config.yml
- Priority 7, uses local_qwen3_8b for RAG queries

Docker:
- Add rag-service to docker-compose.yml
- Configure dependencies (router, city-db)
- Add model cache volume

Documentation:
- Complete README with API examples
- Integration guides for PARSER and Router
This commit is contained in:
Apple
2025-11-16 04:41:53 -08:00
parent d3c701f3ff
commit 9b86f9a694
19 changed files with 1275 additions and 97 deletions

View File

@@ -0,0 +1,57 @@
"""
Document Store for RAG Service
Uses PostgreSQL + pgvector via Haystack
"""
import logging
from typing import Optional
from haystack.document_stores import PGVectorDocumentStore
from app.core.config import settings
logger = logging.getLogger(__name__)
# Global document store instance
_document_store: Optional[PGVectorDocumentStore] = None
def get_document_store() -> PGVectorDocumentStore:
"""
Get or create PGVectorDocumentStore instance
Returns:
PGVectorDocumentStore configured with pgvector
"""
global _document_store
if _document_store is not None:
return _document_store
logger.info(f"Initializing PGVectorDocumentStore: table={settings.RAG_TABLE_NAME}")
logger.info(f"Connection: {settings.PG_DSN.split('@')[1] if '@' in settings.PG_DSN else 'hidden'}")
try:
_document_store = PGVectorDocumentStore(
connection_string=settings.PG_DSN,
embedding_dim=settings.EMBED_DIM,
table_name=settings.RAG_TABLE_NAME,
search_strategy=settings.SEARCH_STRATEGY,
# Additional options
recreate_table=False, # Don't drop existing table
similarity="cosine", # Cosine similarity for embeddings
)
logger.info("PGVectorDocumentStore initialized successfully")
return _document_store
except Exception as e:
logger.error(f"Failed to initialize DocumentStore: {e}", exc_info=True)
raise RuntimeError(f"DocumentStore initialization failed: {e}") from e
def reset_document_store():
"""Reset global document store instance (for testing)"""
global _document_store
_document_store = None