- Vision Encoder Service (OpenCLIP ViT-L/14, GPU-accelerated)
- FastAPI app with text/image embedding endpoints (768-dim)
- Docker support with NVIDIA GPU runtime
- Port 8001, health checks, model info API
- Qdrant Vector Database integration
- Port 6333/6334 (HTTP/gRPC)
- Image embeddings storage (768-dim, Cosine distance)
- Auto collection creation
- Vision RAG implementation
- VisionEncoderClient (Python client for API)
- Image Search module (text-to-image, image-to-image)
- Vision RAG routing in DAGI Router (mode: image_search)
- VisionEncoderProvider integration
- Documentation (5000+ lines)
- SYSTEM-INVENTORY.md - Complete system inventory
- VISION-ENCODER-STATUS.md - Service status
- VISION-RAG-IMPLEMENTATION.md - Implementation details
- vision_encoder_deployment_task.md - Deployment checklist
- services/vision-encoder/README.md - Deployment guide
- Updated WARP.md, INFRASTRUCTURE.md, Jupyter Notebook
- Testing
- test-vision-encoder.sh - Smoke tests (6 tests)
- Unit tests for client, image search, routing
- Services: 17 total (added Vision Encoder + Qdrant)
- AI Models: 3 (qwen3:8b, OpenCLIP ViT-L/14, BAAI/bge-m3)
- GPU Services: 2 (Vision Encoder, Ollama)
- VRAM Usage: ~10 GB (concurrent)
Status: Production Ready ✅
PARSER Service
Document Ingestion & Structuring Agent using dots.ocr.
Опис
PARSER Service — це FastAPI сервіс для розпізнавання та структурування документів (PDF, зображення) через модель dots.ocr.
Структура
parser-service/
├── app/
│ ├── main.py # FastAPI application
│ ├── api/
│ │ └── endpoints.py # API endpoints
│ ├── core/
│ │ └── config.py # Configuration
│ ├── runtime/
│ │ ├── __init__.py
│ │ ├── model_loader.py # Model loading
│ │ └── inference.py # Inference functions
│ └── schemas.py # Pydantic models
├── requirements.txt
├── Dockerfile
└── README.md
API Endpoints
POST /ocr/parse
Parse document (PDF or image).
Request:
file: UploadFile (multipart/form-data)doc_url: Optional[str] (not yet implemented)output_mode:raw_json|markdown|qa_pairs|chunksdao_id: Optional[str]doc_id: Optional[str]
Response:
{
"document": {...}, // for raw_json mode
"markdown": "...", // for markdown mode
"qa_pairs": [...], // for qa_pairs mode
"chunks": [...], // for chunks mode
"metadata": {}
}
POST /ocr/parse_qa
Parse document and return Q&A pairs.
POST /ocr/parse_markdown
Parse document and return Markdown.
POST /ocr/parse_chunks
Parse document and return chunks for RAG.
GET /health
Health check endpoint.
Конфігурація
Environment variables:
PARSER_MODEL_NAME: Model name (default:rednote-hilab/dots.ocr)PARSER_DEVICE: Device (cuda,cpu,mps)PARSER_MAX_PAGES: Max pages to process (default: 100)PARSER_MAX_RESOLUTION: Max resolution (default:4096x4096)MAX_FILE_SIZE_MB: Max file size in MB (default: 50)TEMP_DIR: Temporary directory (default:/tmp/parser)
Запуск
Development
cd services/parser-service
pip install -r requirements.txt
uvicorn app.main:app --reload --host 0.0.0.0 --port 9400
Docker
docker-compose up parser-service
Статус реалізації
- Базова структура сервісу
- API endpoints (з mock-даними)
- Pydantic schemas
- Configuration
- Інтеграція з dots.ocr моделлю
- PDF processing
- Image processing
- Markdown conversion
- QA pairs extraction
Наступні кроки
- Інтегрувати dots.ocr модель в
app/runtime/inference.py - Додати PDF → images конвертацію
- Реалізувати реальний parsing замість dummy
- Додати тести