Files

Apple 4601c6fca8 feat: add Vision Encoder service + Vision RAG implementation

- Vision Encoder Service (OpenCLIP ViT-L/14, GPU-accelerated)
  - FastAPI app with text/image embedding endpoints (768-dim)
  - Docker support with NVIDIA GPU runtime
  - Port 8001, health checks, model info API

- Qdrant Vector Database integration
  - Port 6333/6334 (HTTP/gRPC)
  - Image embeddings storage (768-dim, Cosine distance)
  - Auto collection creation

- Vision RAG implementation
  - VisionEncoderClient (Python client for API)
  - Image Search module (text-to-image, image-to-image)
  - Vision RAG routing in DAGI Router (mode: image_search)
  - VisionEncoderProvider integration

- Documentation (5000+ lines)
  - SYSTEM-INVENTORY.md - Complete system inventory
  - VISION-ENCODER-STATUS.md - Service status
  - VISION-RAG-IMPLEMENTATION.md - Implementation details
  - vision_encoder_deployment_task.md - Deployment checklist
  - services/vision-encoder/README.md - Deployment guide
  - Updated WARP.md, INFRASTRUCTURE.md, Jupyter Notebook

- Testing
  - test-vision-encoder.sh - Smoke tests (6 tests)
  - Unit tests for client, image search, routing

- Services: 17 total (added Vision Encoder + Qdrant)
- AI Models: 3 (qwen3:8b, OpenCLIP ViT-L/14, BAAI/bge-m3)
- GPU Services: 2 (Vision Encoder, Ollama)
- VRAM Usage: ~10 GB (concurrent)

Status: Production Ready ✅

2025-11-17 05:24:36 -08:00

app

feat: add Vision Encoder service + Vision RAG implementation

2025-11-17 05:24:36 -08:00

tests

feat: add qa_build mode, tests, and region mode support

2025-11-16 04:26:35 -08:00

.dockerignore

feat: complete dots.ocr integration with deployment setup

2025-11-16 03:00:01 -08:00

DEPLOYMENT.md

feat: complete dots.ocr integration with deployment setup

2025-11-16 03:00:01 -08:00

docker-compose.yml

feat: complete dots.ocr integration with deployment setup

2025-11-16 03:00:01 -08:00

Dockerfile

feat: complete dots.ocr integration with deployment setup

2025-11-16 03:00:01 -08:00

INTEGRATION.md

feat: add RAG converter utilities and update integration guide

2025-11-16 03:03:20 -08:00

PROMPT_MODES.md

feat: integrate dots.ocr native prompt modes and 2-stage qa_pairs pipeline

2025-11-16 04:24:03 -08:00

pytest.ini

feat: add tests and integrate dots.ocr model

2025-11-15 13:25:01 -08:00

README.md

feat: create PARSER service skeleton with FastAPI

2025-11-15 13:15:08 -08:00

requirements.txt

feat: add Vision Encoder service + Vision RAG implementation

2025-11-17 05:24:36 -08:00

README.md

PARSER Service

Document Ingestion & Structuring Agent using dots.ocr.

Опис

PARSER Service — це FastAPI сервіс для розпізнавання та структурування документів (PDF, зображення) через модель dots.ocr.

Структура

parser-service/
├── app/
│   ├── main.py              # FastAPI application
│   ├── api/
│   │   └── endpoints.py     # API endpoints
│   ├── core/
│   │   └── config.py        # Configuration
│   ├── runtime/
│   │   ├── __init__.py
│   │   ├── model_loader.py  # Model loading
│   │   └── inference.py     # Inference functions
│   └── schemas.py           # Pydantic models
├── requirements.txt
├── Dockerfile
└── README.md

API Endpoints

POST /ocr/parse

Parse document (PDF or image).

Request:

file: UploadFile (multipart/form-data)
doc_url: Optional[str] (not yet implemented)
output_mode: raw_json | markdown | qa_pairs | chunks
dao_id: Optional[str]
doc_id: Optional[str]

Response:

{
  "document": {...},      // for raw_json mode
  "markdown": "...",       // for markdown mode
  "qa_pairs": [...],      // for qa_pairs mode
  "chunks": [...],        // for chunks mode
  "metadata": {}
}

POST /ocr/parse_qa

Parse document and return Q&A pairs.

POST /ocr/parse_markdown

Parse document and return Markdown.

POST /ocr/parse_chunks

Parse document and return chunks for RAG.

GET /health

Health check endpoint.

Конфігурація

Environment variables:

PARSER_MODEL_NAME: Model name (default: rednote-hilab/dots.ocr)
PARSER_DEVICE: Device (cuda, cpu, mps)
PARSER_MAX_PAGES: Max pages to process (default: 100)
PARSER_MAX_RESOLUTION: Max resolution (default: 4096x4096)
MAX_FILE_SIZE_MB: Max file size in MB (default: 50)
TEMP_DIR: Temporary directory (default: /tmp/parser)

Запуск

Development

cd services/parser-service
pip install -r requirements.txt
uvicorn app.main:app --reload --host 0.0.0.0 --port 9400

Docker

docker-compose up parser-service

Статус реалізації

Базова структура сервісу
API endpoints (з mock-даними)
Pydantic schemas
Configuration
Інтеграція з dots.ocr моделлю
PDF processing
Image processing
Markdown conversion
QA pairs extraction

Наступні кроки

Інтегрувати dots.ocr модель в app/runtime/inference.py
Додати PDF → images конвертацію
Реалізувати реальний parsing замість dummy
Додати тести

README.md

PARSER Service

Опис

Структура

API Endpoints

POST /ocr/parse

POST /ocr/parse_qa

POST /ocr/parse_markdown

POST /ocr/parse_chunks

GET /health

Конфігурація

Запуск

Development

Docker

Статус реалізації

Наступні кроки

Посилання