- Create parser-service/ with full structure - Add FastAPI app with endpoints (/parse, /parse_qa, /parse_markdown, /parse_chunks) - Add Pydantic schemas (ParsedDocument, ParsedBlock, ParsedChunk, etc.) - Add runtime module with model_loader and inference (with dummy parser) - Add configuration, Dockerfile, requirements.txt - Update TODO-PARSER-RAG.md with completed tasks - Ready for dots.ocr model integration
23 lines
392 B
Plaintext
23 lines
392 B
Plaintext
# FastAPI and server
|
|
fastapi==0.104.1
|
|
uvicorn[standard]==0.24.0
|
|
python-multipart==0.0.6
|
|
pydantic==2.5.0
|
|
pydantic-settings==2.1.0
|
|
|
|
# Model and ML
|
|
torch>=2.0.0
|
|
transformers>=4.35.0
|
|
Pillow>=10.0.0
|
|
|
|
# PDF processing
|
|
pdf2image>=1.16.3
|
|
PyMuPDF>=1.23.0 # Alternative PDF library
|
|
|
|
# Image processing
|
|
opencv-python>=4.8.0 # Optional, for advanced image processing
|
|
|
|
# Utilities
|
|
python-dotenv>=1.0.1
|
|
|