feat: add Ollama runtime support and RAG implementation plan
Ollama Runtime: - Add ollama_client.py for Ollama API integration - Support for dots-ocr model via Ollama - Add OLLAMA_BASE_URL configuration - Update inference.py to support Ollama runtime (RUNTIME_TYPE=ollama) - Update endpoints to handle async Ollama calls - Alternative to local transformers model RAG Implementation Plan: - Create TODO-RAG.md with detailed Haystack integration plan - Document Store setup (pgvector) - Embedding model selection - Ingest pipeline (PARSER → RAG) - Query pipeline (RAG → LLM) - Integration with DAGI Router - Bot commands (/upload_doc, /ask_doc) - Testing strategy Now supports three runtime modes: 1. Local transformers (RUNTIME_TYPE=local) 2. Ollama (RUNTIME_TYPE=ollama) 3. Dummy (USE_DUMMY_PARSER=true)
This commit is contained in:
@@ -90,12 +90,23 @@ async def parse_document_endpoint(
|
||||
# Parse document from images
|
||||
logger.info(f"Parsing document: {len(images)} page(s), mode: {output_mode}")
|
||||
|
||||
parsed_doc = parse_document_from_images(
|
||||
images=images,
|
||||
output_mode=output_mode,
|
||||
doc_id=doc_id or str(uuid.uuid4()),
|
||||
doc_type=doc_type
|
||||
)
|
||||
# Check if using Ollama (async) or local model (sync)
|
||||
from app.core.config import settings
|
||||
if settings.RUNTIME_TYPE == "ollama":
|
||||
from app.runtime.inference import parse_document_with_ollama
|
||||
parsed_doc = await parse_document_with_ollama(
|
||||
images=images,
|
||||
output_mode=output_mode,
|
||||
doc_id=doc_id or str(uuid.uuid4()),
|
||||
doc_type=doc_type
|
||||
)
|
||||
else:
|
||||
parsed_doc = parse_document_from_images(
|
||||
images=images,
|
||||
output_mode=output_mode,
|
||||
doc_id=doc_id or str(uuid.uuid4()),
|
||||
doc_type=doc_type
|
||||
)
|
||||
|
||||
# Build response based on output_mode
|
||||
response_data = {"metadata": {
|
||||
|
||||
Reference in New Issue
Block a user