feat: add Ollama runtime support and RAG implementation plan

Ollama Runtime:
- Add ollama_client.py for Ollama API integration
- Support for dots-ocr model via Ollama
- Add OLLAMA_BASE_URL configuration
- Update inference.py to support Ollama runtime (RUNTIME_TYPE=ollama)
- Update endpoints to handle async Ollama calls
- Alternative to local transformers model

RAG Implementation Plan:
- Create TODO-RAG.md with detailed Haystack integration plan
- Document Store setup (pgvector)
- Embedding model selection
- Ingest pipeline (PARSER → RAG)
- Query pipeline (RAG → LLM)
- Integration with DAGI Router
- Bot commands (/upload_doc, /ask_doc)
- Testing strategy

Now supports three runtime modes:
1. Local transformers (RUNTIME_TYPE=local)
2. Ollama (RUNTIME_TYPE=ollama)
3. Dummy (USE_DUMMY_PARSER=true)
This commit is contained in:
Apple
2025-11-16 02:56:36 -08:00
parent d56ff3493d
commit 00f9102e50
6 changed files with 607 additions and 9 deletions

View File

@@ -90,12 +90,23 @@ async def parse_document_endpoint(
# Parse document from images
logger.info(f"Parsing document: {len(images)} page(s), mode: {output_mode}")
parsed_doc = parse_document_from_images(
images=images,
output_mode=output_mode,
doc_id=doc_id or str(uuid.uuid4()),
doc_type=doc_type
)
# Check if using Ollama (async) or local model (sync)
from app.core.config import settings
if settings.RUNTIME_TYPE == "ollama":
from app.runtime.inference import parse_document_with_ollama
parsed_doc = await parse_document_with_ollama(
images=images,
output_mode=output_mode,
doc_id=doc_id or str(uuid.uuid4()),
doc_type=doc_type
)
else:
parsed_doc = parse_document_from_images(
images=images,
output_mode=output_mode,
doc_id=doc_id or str(uuid.uuid4()),
doc_type=doc_type
)
# Build response based on output_mode
response_data = {"metadata": {