Files
microdao-daarion/docs/cursor/vision_encoder_deployment_task.md
2026-02-16 07:28:15 -08:00

15 KiB
Raw Permalink Blame History

Vision Encoder Service — Deployment Task (Warp/DevOps)

Task ID: VISION-001
Status: COMPLETE
Assigned to: Warp AI / DevOps
Date: 2025-01-17


🎯 Goal

Підняти на сервері сервіс vision-encoder, який надає REST-API для embeddings тексту та зображень (CLIP / OpenCLIP ViT-L/14@336), і підключити його до Qdrant для image-RAG.


📋 Scope

  1. Підготовка середовища (CUDA, драйвери, Python або Docker)
  2. Запуск контейнера vision-encoder (FastAPI + OpenCLIP)
  3. Забезпечити доступ DAGI Router до API vision-encoder
  4. Підняти Qdrant як backend для векторів зображень

TODO Checklist (Completed)

1. Перевірити GPU-стек на сервері

Task: Переконатися, що встановлені NVIDIA драйвери, CUDA / cuDNN

Commands:

# Check GPU
nvidia-smi

# Check CUDA version
nvcc --version

# Check Docker GPU runtime
docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi

Expected Output:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05   Driver Version: 535.104.05   CUDA Version: 12.2    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce...  Off  | 00000000:01:00.0 Off |                  N/A |
| 30%   45C    P0    25W / 250W |      0MiB / 11264MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

Status: COMPLETE


2. Створити Docker-образ для vision-encoder

Task: Додати Dockerfile для сервісу vision-encoder з GPU підтримкою

File: services/vision-encoder/Dockerfile

Implementation:

# Base: PyTorch with CUDA support
FROM pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime

WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/*

# Copy requirements and install
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY app/ ./app/

# Create cache directory for model weights
RUN mkdir -p /root/.cache/clip

# Environment variables
ENV PYTHONUNBUFFERED=1
ENV DEVICE=cuda
ENV MODEL_NAME=ViT-L-14
ENV MODEL_PRETRAINED=openai
ENV PORT=8001

EXPOSE 8001

HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
    CMD curl -f http://localhost:8001/health || exit 1

CMD ["python", "-m", "uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8001"]

Dependencies: services/vision-encoder/requirements.txt

fastapi==0.109.0
uvicorn[standard]==0.27.0
pydantic==2.5.0
python-multipart==0.0.6
open_clip_torch==2.24.0
torch>=2.0.0
torchvision>=0.15.0
Pillow==10.2.0
httpx==0.26.0
numpy==1.26.3

Build Command:

docker build -t vision-encoder:latest services/vision-encoder/

Status: COMPLETE


3. Docker Compose / k8s конфігурація

Task: Додати vision-encoder та qdrant в docker-compose.yml

File: docker-compose.yml

Implementation:

services:
  # Vision Encoder Service - OpenCLIP for text/image embeddings
  vision-encoder:
    build:
      context: ./services/vision-encoder
      dockerfile: Dockerfile
    container_name: dagi-vision-encoder
    ports:
      - "8001:8001"
    environment:
      - DEVICE=${VISION_DEVICE:-cuda}
      - MODEL_NAME=${VISION_MODEL_NAME:-ViT-L-14}
      - MODEL_PRETRAINED=${VISION_MODEL_PRETRAINED:-openai}
      - NORMALIZE_EMBEDDINGS=true
      - QDRANT_HOST=qdrant
      - QDRANT_PORT=6333
      - QDRANT_ENABLED=true
    volumes:
      - ./logs:/app/logs
      - vision-model-cache:/root/.cache/clip
    depends_on:
      - qdrant
    networks:
      - dagi-network
    restart: unless-stopped
    # GPU support - requires nvidia-docker runtime
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8001/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 60s

  # Qdrant Vector Database - for image/text embeddings
  qdrant:
    image: qdrant/qdrant:v1.7.4
    container_name: dagi-qdrant
    ports:
      - "6333:6333"  # HTTP API
      - "6334:6334"  # gRPC API
    volumes:
      - qdrant-data:/qdrant/storage
    networks:
      - dagi-network
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:6333/healthz"]
      interval: 30s
      timeout: 10s
      retries: 3

volumes:
  vision-model-cache:
    driver: local
  qdrant-data:
    driver: local

Status: COMPLETE


4. Налаштувати змінні оточення

Task: Додати environment variables для vision-encoder

File: .env

Implementation:

# Vision Encoder Configuration
VISION_ENCODER_URL=http://vision-encoder:8001
VISION_DEVICE=cuda
VISION_MODEL_NAME=ViT-L-14
VISION_MODEL_PRETRAINED=openai
VISION_ENCODER_TIMEOUT=60

# Qdrant Configuration
QDRANT_HOST=qdrant
QDRANT_PORT=6333
QDRANT_GRPC_PORT=6334
QDRANT_ENABLED=true

# Image Search Settings
IMAGE_SEARCH_DEFAULT_TOP_K=5
IMAGE_SEARCH_COLLECTION=daarion_images

Status: COMPLETE


5. Мережева конфігурація

Task: Забезпечити доступ DAGI Router до vision-encoder через Docker network

Network: dagi-network (bridge)

Service URLs:

Service Internal URL External Port Health Check
Vision Encoder http://vision-encoder:8001 8001 http://localhost:8001/health
Qdrant HTTP http://qdrant:6333 6333 http://localhost:6333/healthz
Qdrant gRPC qdrant:6334 6334 -

Router Configuration:

Added to providers/registry.py:

# Build Vision Encoder provider
vision_encoder_url = os.getenv("VISION_ENCODER_URL", "http://vision-encoder:8001")
if vision_encoder_url:
    provider_id = "vision_encoder"
    provider = VisionEncoderProvider(
        provider_id=provider_id,
        base_url=vision_encoder_url,
        timeout=60
    )
    registry[provider_id] = provider
    logger.info(f"  + {provider_id}: VisionEncoder @ {vision_encoder_url}")

Added to router-config.yml:

routing:
  - id: vision_encoder_embed
    priority: 3
    when:
      mode: vision_embed
    use_provider: vision_encoder
    description: "Text/Image embeddings → Vision Encoder (OpenCLIP ViT-L/14)"
  
  - id: image_search_mode
    priority: 2
    when:
      mode: image_search
    use_provider: vision_rag
    description: "Image search (text-to-image or image-to-image) → Vision RAG"

Status: COMPLETE


6. Підняти Qdrant/Milvus

Task: Запустити Qdrant vector database

Commands:

# Start Qdrant
docker-compose up -d qdrant

# Check status
docker-compose ps qdrant

# Check logs
docker-compose logs -f qdrant

# Verify health
curl http://localhost:6333/healthz

Create Collection:

curl -X PUT http://localhost:6333/collections/daarion_images \
  -H "Content-Type: application/json" \
  -d '{
    "vectors": {
      "size": 768,
      "distance": "Cosine"
    }
  }'

Verify Collection:

curl http://localhost:6333/collections/daarion_images

Expected Response:

{
  "result": {
    "status": "green",
    "vectors_count": 0,
    "indexed_vectors_count": 0,
    "points_count": 0
  }
}

Status: COMPLETE


7. Smoke-тести

Task: Створити та запустити smoke tests для vision-encoder

File: test-vision-encoder.sh

Tests Implemented:

  1. Health Check - Service is healthy, GPU available
  2. Model Info - Model loaded, embedding dimension correct
  3. Text Embedding - Generate 768-dim text embedding, normalized
  4. Image Embedding - Generate 768-dim image embedding from URL
  5. Router Integration - Text embedding via DAGI Router works
  6. Qdrant Health - Vector database is accessible

Run Command:

chmod +x test-vision-encoder.sh
./test-vision-encoder.sh

Expected Output:

======================================
Vision Encoder Smoke Tests
======================================
Vision Encoder: http://localhost:8001
DAGI Router: http://localhost:9102

Test 1: Health Check
------------------------------------
{
  "status": "healthy",
  "device": "cuda",
  "model": "ViT-L-14/openai",
  "cuda_available": true,
  "gpu_name": "NVIDIA GeForce RTX 3090"
}
✅ PASS: Service is healthy (device: cuda)

Test 2: Model Info
------------------------------------
{
  "model_name": "ViT-L-14",
  "pretrained": "openai",
  "device": "cuda",
  "embedding_dim": 768,
  "normalize_default": true,
  "qdrant_enabled": true
}
✅ PASS: Model info retrieved (model: ViT-L-14, dim: 768)

Test 3: Text Embedding
------------------------------------
{
  "dimension": 768,
  "model": "ViT-L-14/openai",
  "normalized": true
}
✅ PASS: Text embedding generated (dim: 768, normalized: true)

Test 4: Image Embedding (from URL)
------------------------------------
{
  "dimension": 768,
  "model": "ViT-L-14/openai",
  "normalized": true
}
✅ PASS: Image embedding generated (dim: 768, normalized: true)

Test 5: Router Integration (Text Embedding)
------------------------------------
{
  "ok": true,
  "provider_id": "vision_encoder",
  "data": {
    "dimension": 768,
    "normalized": true
  }
}
✅ PASS: Router integration working (provider: vision_encoder)

Test 6: Qdrant Health Check
------------------------------------
ok
✅ PASS: Qdrant is healthy

======================================
✅ Vision Encoder Smoke Tests PASSED
======================================

Status: COMPLETE


📊 Deployment Steps (Server)

On Server (144.76.224.179)

# 1. SSH to server
ssh root@144.76.224.179

# 2. Navigate to project
cd /opt/microdao-daarion

# 3. Pull latest code
git pull origin main

# 4. Check GPU
nvidia-smi

# 5. Build vision-encoder image
docker-compose build vision-encoder

# 6. Start services
docker-compose up -d vision-encoder qdrant

# 7. Check logs
docker-compose logs -f vision-encoder

# 8. Wait for model to load (15-30 seconds)
# Look for: "Model loaded successfully. Embedding dimension: 768"

# 9. Run smoke tests
./test-vision-encoder.sh

# 10. Verify health
curl http://localhost:8001/health
curl http://localhost:6333/healthz

# 11. Create Qdrant collection
curl -X PUT http://localhost:6333/collections/daarion_images \
  -H "Content-Type: application/json" \
  -d '{
    "vectors": {
      "size": 768,
      "distance": "Cosine"
    }
  }'

# 12. Test via Router
curl -X POST http://localhost:9102/route \
  -H "Content-Type: application/json" \
  -d '{
    "mode": "vision_embed",
    "message": "embed text",
    "payload": {
      "operation": "embed_text",
      "text": "DAARION tokenomics",
      "normalize": true
    }
  }'

Acceptance Criteria

GPU Stack:

  • NVIDIA drivers встановлені (535.104.05+)
  • CUDA доступна (12.1+)
  • Docker GPU runtime працює
  • nvidia-smi показує GPU

Docker Images:

  • vision-encoder:latest зібрано
  • Base image: pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime
  • OpenCLIP встановлено
  • FastAPI працює

Services Running:

  • dagi-vision-encoder container працює на порту 8001
  • dagi-qdrant container працює на порту 6333/6334
  • Health checks проходять
  • GPU використовується (видно в nvidia-smi)

Network:

  • DAGI Router може звертатися до http://vision-encoder:8001
  • Vision Encoder може звертатися до http://qdrant:6333
  • Services в dagi-network

API Functional:

  • /health повертає GPU info
  • /info повертає model metadata (768-dim)
  • /embed/text генерує embeddings
  • /embed/image генерує embeddings
  • Embeddings нормалізовані

Router Integration:

  • vision_encoder provider registered
  • Routing rule vision_embed працює
  • Router може викликати Vision Encoder
  • Routing rule image_search працює (Vision RAG)

Qdrant:

  • Qdrant доступний на 6333/6334
  • Collection daarion_images створена
  • 768-dim vectors, Cosine distance
  • Health check проходить

Testing:

  • Smoke tests створені (test-vision-encoder.sh)
  • Всі 6 тестів проходять
  • Manual testing successful

Documentation:

  • README.md created (services/vision-encoder/README.md)
  • VISION-ENCODER-STATUS.md created
  • VISION-RAG-IMPLEMENTATION.md created
  • INFRASTRUCTURE.md updated
  • Environment variables documented
  • Troubleshooting guide included

📈 Performance Verification

Expected Performance (GPU)

  • Text embedding: 10-20ms
  • Image embedding: 30-50ms
  • Model loading: 15-30 seconds
  • GPU memory usage: ~4 GB (ViT-L/14)

Verify Performance

# Check GPU usage
nvidia-smi

# Check container stats
docker stats dagi-vision-encoder

# Check logs for timing
docker-compose logs vision-encoder | grep "took"

🐛 Troubleshooting

Problem: Container fails to start

Check:

docker-compose logs vision-encoder

Common issues:

  1. CUDA not available → Check nvidia-smi and Docker GPU runtime
  2. Model download fails → Check internet connection, retry
  3. OOM (Out of Memory) → Use smaller model (ViT-B-32) or check GPU memory

Problem: Slow inference

Check device:

curl http://localhost:8001/health | jq '.device'

If "device": "cpu" → GPU not available, fix NVIDIA runtime

Problem: Qdrant not accessible

Check:

docker-compose ps qdrant
docker exec -it dagi-vision-encoder ping qdrant

Restart:

docker-compose restart qdrant

📖 Documentation References


📊 Statistics

Services Added: 2

  • Vision Encoder (8001)
  • Qdrant (6333/6334)

Total Services: 17 (was 15)

Code:

  • FastAPI service: 322 lines
  • Provider: 202 lines
  • Client: 150 lines
  • Image Search: 200 lines
  • Vision RAG: 150 lines
  • Tests: 461 lines (smoke + unit)
  • Documentation: 2000+ lines

Total: ~3500+ lines


Status: COMPLETE
Deployed: 2025-01-17
Maintained by: Ivan Tytar & DAARION Team