Files

Apple 6f28171842 docs: add quick deployment guide for Vision Encoder

- One-command deploy via automated script
- Manual step-by-step deployment
- Verification checklist
- Troubleshooting guide
- Expected results and GPU usage
- Next steps after deployment

2025-11-17 05:26:52 -08:00

7.5 KiB

Raw Blame History

🚀 Vision Encoder Deployment — Quick Guide

Server: 144.76.224.179 (Hetzner GEX44 #2844465)
Status: ✅ Code pushed to GitHub
Ready to deploy: YES

⚡ Quick Deploy (One Command)

SSH to server and run automated script:

ssh root@144.76.224.179 'cd /opt/microdao-daarion && git pull origin main && ./deploy-vision-encoder.sh'

That's it! The script will:

✅ Pull latest code
✅ Check GPU & Docker GPU runtime
✅ Build Vision Encoder image
✅ Start Vision Encoder + Qdrant
✅ Run health checks
✅ Run smoke tests
✅ Show GPU status

📋 Manual Deploy (Step by Step)

If you prefer manual deployment:

1. SSH to Server

ssh root@144.76.224.179

2. Navigate to Project

cd /opt/microdao-daarion

3. Pull Latest Code

git pull origin main

4. Check GPU

nvidia-smi

Should show NVIDIA GPU with ~24 GB VRAM.

5. Build Vision Encoder

docker-compose build vision-encoder

This takes 5-10 minutes (downloads PyTorch + OpenCLIP).

6. Start Services

docker-compose up -d vision-encoder qdrant

7. Check Logs

docker-compose logs -f vision-encoder

Wait for: "Model loaded successfully. Embedding dimension: 768"

8. Verify Health

curl http://localhost:8001/health
curl http://localhost:6333/healthz

9. Create Qdrant Collection

curl -X PUT http://localhost:6333/collections/daarion_images \
  -H "Content-Type: application/json" \
  -d '{"vectors": {"size": 768, "distance": "Cosine"}}'

10. Run Smoke Tests

chmod +x ./test-vision-encoder.sh
./test-vision-encoder.sh

11. Monitor GPU

watch -n 1 nvidia-smi

Should show Vision Encoder using ~4 GB VRAM.

🔍 Verification

Check All Services

docker-compose ps

All 17 services should be "Up":

dagi-router (9102)
dagi-gateway (9300)
dagi-devtools (8008)
dagi-crewai (9010)
dagi-rbac (9200)
dagi-rag-service (9500)
dagi-memory-service (8000)
dagi-parser-service (9400)
dagi-vision-encoder (8001) ← NEW
dagi-postgres (5432)
redis (6379)
neo4j (7687/7474)
dagi-qdrant (6333/6334) ← NEW
grafana (3000)
prometheus (9090)
neo4j-exporter (9091)
ollama (11434)

Test Vision Encoder API

# Text embedding
curl -X POST http://localhost:8001/embed/text \
  -H "Content-Type: application/json" \
  -d '{"text": "токеноміка DAARION", "normalize": true}'

# Should return: {"embedding": [...], "dimension": 768, ...}

Test via Router

curl -X POST http://localhost:9102/route \
  -H "Content-Type: application/json" \
  -d '{
    "mode": "vision_embed",
    "message": "embed text",
    "payload": {
      "operation": "embed_text",
      "text": "DAARION governance",
      "normalize": true
    }
  }'

📊 Expected Results

GPU Usage

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05   Driver Version: 535.104.05   CUDA Version: 12.2    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce...  Off  | 00000000:01:00.0 Off |                  N/A |
| 35%   52C    P2    85W / 350W |   4096MiB / 24576MiB |     15%      Default |
+-------------------------------+----------------------+----------------------+

VRAM Allocation:

Vision Encoder: ~4 GB (always loaded)
Ollama (qwen3:8b): ~6 GB (when active)
Available: ~14 GB

Service Logs

Vision Encoder startup logs:

{"timestamp": "2025-01-17 13:00:00", "level": "INFO", "message": "Starting vision-encoder service..."}
{"timestamp": "2025-01-17 13:00:01", "level": "INFO", "message": "Loading model ViT-L-14 with pretrained weights openai"}
{"timestamp": "2025-01-17 13:00:01", "level": "INFO", "message": "Device: cuda"}
{"timestamp": "2025-01-17 13:00:15", "level": "INFO", "message": "Model loaded successfully. Embedding dimension: 768"}
{"timestamp": "2025-01-17 13:00:15", "level": "INFO", "message": "GPU: NVIDIA GeForce RTX 3090, Memory: 24.00 GB"}
{"timestamp": "2025-01-17 13:00:15", "level": "INFO", "message": "Uvicorn running on http://0.0.0.0:8001"}

🐛 Troubleshooting

Problem: GPU not detected

Check:

nvidia-smi

Fix:

# Install NVIDIA drivers (if needed)
sudo apt install nvidia-driver-535
sudo reboot

# Install NVIDIA Container Toolkit
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

Problem: Vision Encoder using CPU instead of GPU

Check device:

curl http://localhost:8001/health | jq '.device'

If returns "cpu":

Check GPU runtime: docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi
Restart Vision Encoder: docker-compose restart vision-encoder
Check logs: docker-compose logs vision-encoder

Problem: Out of Memory

Check GPU memory:

nvidia-smi

Solutions:

Use smaller model: Edit docker-compose.yml → MODEL_NAME=ViT-B-32 (2 GB instead of 4 GB)
Stop Ollama temporarily: docker stop ollama
Restart services: docker-compose restart vision-encoder

📖 Documentation

SYSTEM-INVENTORY.md — Complete system inventory (GPU, models, services)
VISION-ENCODER-STATUS.md — Vision Encoder service status
VISION-RAG-IMPLEMENTATION.md — Implementation details
services/vision-encoder/README.md — Full deployment guide
docs/cursor/vision_encoder_deployment_task.md — Deployment checklist

✅ Deployment Checklist

Before Deployment:

Code committed to Git
Code pushed to GitHub
Documentation updated
Tests created
Deploy script created

After Deployment:

Vision Encoder running (port 8001)
Qdrant running (port 6333)
Health checks passing
Smoke tests passing
GPU detected and used (~4 GB VRAM)
Qdrant collection created
Integration with Router working

🎯 Next Steps After Deployment

1. Index Existing Images

# Example: Index images from Parser Service output
python scripts/index_images.py --dao-id daarion --directory /data/images

2. Test Image Search

# Text-to-image search
curl -X POST http://localhost:9102/route \
  -H "Content-Type: application/json" \
  -d '{
    "mode": "image_search",
    "message": "знайди діаграми токеноміки",
    "dao_id": "daarion",
    "payload": {"top_k": 5}
  }'

3. Monitor Performance

# GPU usage
watch -n 1 nvidia-smi

# Service logs
docker-compose logs -f vision-encoder

# Request metrics
curl http://localhost:9090/metrics | grep vision_encoder

Status: ✅ Ready to Deploy
Last Updated: 2025-01-17
Maintained by: Ivan Tytar & DAARION Team

7.5 KiB Raw Blame History