Files
microdao-daarion/services/parser-service/requirements.txt
Apple 2a353040f6 feat: add tests and integrate dots.ocr model
G.2.5 - Tests:
- Add pytest test suite with fixtures
- test_preprocessing.py - PDF/image loading, normalization, validation
- test_postprocessing.py - chunks, QA pairs, markdown generation
- test_inference.py - dummy parser and inference functions
- test_api.py - API endpoint tests
- Add pytest.ini configuration

G.1.3 - dots.ocr Integration:
- Update model_loader.py with real model loading code
  - Support for AutoModelForVision2Seq and AutoProcessor
  - Device handling (CUDA/CPU/MPS) with fallback
  - Error handling with dummy fallback option
- Update inference.py with real model inference
  - Process images through model
  - Generate and decode outputs
  - Parse model output to blocks
- Add model_output_parser.py
  - Parse JSON or plain text model output
  - Convert to structured blocks
  - Layout detection support (placeholder)

Dependencies:
- Add pytest, pytest-asyncio, httpx for testing
2025-11-15 13:25:01 -08:00

28 lines
472 B
Plaintext

# FastAPI and server
fastapi==0.104.1
uvicorn[standard]==0.24.0
python-multipart==0.0.6
pydantic==2.5.0
pydantic-settings==2.1.0
# Model and ML
torch>=2.0.0
transformers>=4.35.0
Pillow>=10.0.0
# PDF processing
pdf2image>=1.16.3
PyMuPDF>=1.23.0 # Alternative PDF library
# Image processing
opencv-python>=4.8.0 # Optional, for advanced image processing
# Utilities
python-dotenv>=1.0.1
# Testing
pytest>=7.4.0
pytest-asyncio>=0.21.0
httpx>=0.25.0 # For TestClient