feat: implement RAG Service MVP with PARSER + Memory integration

RAG Service Implementation:
- Create rag-service/ with full structure (config, document_store, embedding, pipelines)
- Document Store: PostgreSQL + pgvector via Haystack
- Embedding: BAAI/bge-m3 (multilingual, 1024 dim)
- Ingest Pipeline: Convert ParsedDocument to Haystack Documents, embed, index
- Query Pipeline: Retrieve documents, generate answers via DAGI Router
- FastAPI endpoints: /ingest, /query, /health

Tests:
- Unit tests for ingest and query pipelines
- E2E test with example parsed JSON
- Test fixtures with real PARSER output example

Router Integration:
- Add mode='rag_query' routing rule in router-config.yml
- Priority 7, uses local_qwen3_8b for RAG queries

Docker:
- Add rag-service to docker-compose.yml
- Configure dependencies (router, city-db)
- Add model cache volume

Documentation:
- Complete README with API examples
- Integration guides for PARSER and Router
This commit is contained in:
Apple
2025-11-16 04:41:53 -08:00
parent d3c701f3ff
commit 9b86f9a694
19 changed files with 1275 additions and 97 deletions

View File

@@ -0,0 +1,50 @@
"""
Tests for query pipeline
"""
import pytest
from unittest.mock import AsyncMock, patch, MagicMock
from app.query_pipeline import answer_query, _build_citations
class TestQueryPipeline:
"""Tests for RAG query pipeline"""
@pytest.mark.asyncio
async def test_answer_query_no_documents(self):
"""Test query when no documents found"""
with patch("app.query_pipeline._retrieve_documents", return_value=[]):
result = await answer_query(
dao_id="test-dao",
question="Test question"
)
assert "answer" in result
assert "На жаль, я не знайшов" in result["answer"]
assert result["citations"] == []
@pytest.mark.asyncio
async def test_build_citations(self):
"""Test citation building"""
from haystack.schema import Document
documents = [
Document(
content="Test content 1",
meta={"doc_id": "doc1", "page": 1, "section": "Section 1"}
),
Document(
content="Test content 2",
meta={"doc_id": "doc2", "page": 2}
)
]
citations = _build_citations(documents)
assert len(citations) == 2
assert citations[0]["doc_id"] == "doc1"
assert citations[0]["page"] == 1
assert citations[0]["section"] == "Section 1"
assert citations[1]["doc_id"] == "doc2"
assert citations[1]["page"] == 2