Ollama Runtime:
- Add ollama_client.py for Ollama API integration
- Support for dots-ocr model via Ollama
- Add OLLAMA_BASE_URL configuration
- Update inference.py to support Ollama runtime (RUNTIME_TYPE=ollama)
- Update endpoints to handle async Ollama calls
- Alternative to local transformers model
RAG Implementation Plan:
- Create TODO-RAG.md with detailed Haystack integration plan
- Document Store setup (pgvector)
- Embedding model selection
- Ingest pipeline (PARSER → RAG)
- Query pipeline (RAG → LLM)
- Integration with DAGI Router
- Bot commands (/upload_doc, /ask_doc)
- Testing strategy
Now supports three runtime modes:
1. Local transformers (RUNTIME_TYPE=local)
2. Ollama (RUNTIME_TYPE=ollama)
3. Dummy (USE_DUMMY_PARSER=true)
G.2.5 - Tests:
- Add pytest test suite with fixtures
- test_preprocessing.py - PDF/image loading, normalization, validation
- test_postprocessing.py - chunks, QA pairs, markdown generation
- test_inference.py - dummy parser and inference functions
- test_api.py - API endpoint tests
- Add pytest.ini configuration
G.1.3 - dots.ocr Integration:
- Update model_loader.py with real model loading code
- Support for AutoModelForVision2Seq and AutoProcessor
- Device handling (CUDA/CPU/MPS) with fallback
- Error handling with dummy fallback option
- Update inference.py with real model inference
- Process images through model
- Generate and decode outputs
- Parse model output to blocks
- Add model_output_parser.py
- Parse JSON or plain text model output
- Convert to structured blocks
- Layout detection support (placeholder)
Dependencies:
- Add pytest, pytest-asyncio, httpx for testing