docs: add quick deployment guide for Vision Encoder

- One-command deploy via automated script - Manual step-by-step deployment - Verification checklist - Troubleshooting guide - Expected results and GPU usage - Next steps after deployment
2025-11-17 05:26:52 -08:00
parent 12ce1fa9b9
commit 6f28171842
1 changed files with 324 additions and 0 deletions
--- a/DEPLOY-VISION-ENCODER.md
+++ b/DEPLOY-VISION-ENCODER.md
@@ -0,0 +1,324 @@
+# 🚀 Vision Encoder Deployment — Quick Guide
+
+**Server:** 144.76.224.179 (Hetzner GEX44 #2844465)  
+**Status:** ✅ Code pushed to GitHub  
+**Ready to deploy:** YES
+
+---
+
+## ⚡ Quick Deploy (One Command)
+
+SSH to server and run automated script:
+
+```bash
+ssh root@144.76.224.179 'cd /opt/microdao-daarion && git pull origin main && ./deploy-vision-encoder.sh'
+```
+
+**That's it!** The script will:
+- ✅ Pull latest code
+- ✅ Check GPU & Docker GPU runtime
+- ✅ Build Vision Encoder image
+- ✅ Start Vision Encoder + Qdrant
+- ✅ Run health checks
+- ✅ Run smoke tests
+- ✅ Show GPU status
+
+---
+
+## 📋 Manual Deploy (Step by Step)
+
+If you prefer manual deployment:
+
+### 1. SSH to Server
+
+```bash
+ssh root@144.76.224.179
+```
+
+### 2. Navigate to Project
+
+```bash
+cd /opt/microdao-daarion
+```
+
+### 3. Pull Latest Code
+
+```bash
+git pull origin main
+```
+
+### 4. Check GPU
+
+```bash
+nvidia-smi
+```
+
+Should show NVIDIA GPU with ~24 GB VRAM.
+
+### 5. Build Vision Encoder
+
+```bash
+docker-compose build vision-encoder
+```
+
+This takes 5-10 minutes (downloads PyTorch + OpenCLIP).
+
+### 6. Start Services
+
+```bash
+docker-compose up -d vision-encoder qdrant
+```
+
+### 7. Check Logs
+
+```bash
+docker-compose logs -f vision-encoder
+```
+
+Wait for: `"Model loaded successfully. Embedding dimension: 768"`
+
+### 8. Verify Health
+
+```bash
+curl http://localhost:8001/health
+curl http://localhost:6333/healthz
+```
+
+### 9. Create Qdrant Collection
+
+```bash
+curl -X PUT http://localhost:6333/collections/daarion_images \
+  -H "Content-Type: application/json" \
+  -d '{"vectors": {"size": 768, "distance": "Cosine"}}'
+```
+
+### 10. Run Smoke Tests
+
+```bash
+chmod +x ./test-vision-encoder.sh
+./test-vision-encoder.sh
+```
+
+### 11. Monitor GPU
+
+```bash
+watch -n 1 nvidia-smi
+```
+
+Should show Vision Encoder using ~4 GB VRAM.
+
+---
+
+## 🔍 Verification
+
+### Check All Services
+
+```bash
+docker-compose ps
+```
+
+All 17 services should be "Up":
+- dagi-router (9102)
+- dagi-gateway (9300)
+- dagi-devtools (8008)
+- dagi-crewai (9010)
+- dagi-rbac (9200)
+- dagi-rag-service (9500)
+- dagi-memory-service (8000)
+- dagi-parser-service (9400)
+- **dagi-vision-encoder (8001)** ← NEW
+- dagi-postgres (5432)
+- redis (6379)
+- neo4j (7687/7474)
+- **dagi-qdrant (6333/6334)** ← NEW
+- grafana (3000)
+- prometheus (9090)
+- neo4j-exporter (9091)
+- ollama (11434)
+
+### Test Vision Encoder API
+
+```bash
+# Text embedding
+curl -X POST http://localhost:8001/embed/text \
+  -H "Content-Type: application/json" \
+  -d '{"text": "токеноміка DAARION", "normalize": true}'
+
+# Should return: {"embedding": [...], "dimension": 768, ...}
+```
+
+### Test via Router
+
+```bash
+curl -X POST http://localhost:9102/route \
+  -H "Content-Type: application/json" \
+  -d '{
+    "mode": "vision_embed",
+    "message": "embed text",
+    "payload": {
+      "operation": "embed_text",
+      "text": "DAARION governance",
+      "normalize": true
+    }
+  }'
+```
+
+---
+
+## 📊 Expected Results
+
+### GPU Usage
+
+```
+-----------------------------------------------------------------------------+
+| NVIDIA-SMI 535.104.05   Driver Version: 535.104.05   CUDA Version: 12.2    |
+|-------------------------------+----------------------+----------------------+
+| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
+| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
+|===============================+======================+======================|
+|   0  NVIDIA GeForce...  Off  | 00000000:01:00.0 Off |                  N/A |
+| 35%   52C    P2    85W / 350W |   4096MiB / 24576MiB |     15%      Default |
+-------------------------------+----------------------+----------------------+
+```
+
+**VRAM Allocation:**
+- Vision Encoder: ~4 GB (always loaded)
+- Ollama (qwen3:8b): ~6 GB (when active)
+- Available: ~14 GB
+
+### Service Logs
+
+Vision Encoder startup logs:
+```json
+{"timestamp": "2025-01-17 13:00:00", "level": "INFO", "message": "Starting vision-encoder service..."}
+{"timestamp": "2025-01-17 13:00:01", "level": "INFO", "message": "Loading model ViT-L-14 with pretrained weights openai"}
+{"timestamp": "2025-01-17 13:00:01", "level": "INFO", "message": "Device: cuda"}
+{"timestamp": "2025-01-17 13:00:15", "level": "INFO", "message": "Model loaded successfully. Embedding dimension: 768"}
+{"timestamp": "2025-01-17 13:00:15", "level": "INFO", "message": "GPU: NVIDIA GeForce RTX 3090, Memory: 24.00 GB"}
+{"timestamp": "2025-01-17 13:00:15", "level": "INFO", "message": "Uvicorn running on http://0.0.0.0:8001"}
+```
+
+---
+
+## 🐛 Troubleshooting
+
+### Problem: GPU not detected
+
+**Check:**
+```bash
+nvidia-smi
+```
+
+**Fix:**
+```bash
+# Install NVIDIA drivers (if needed)
+sudo apt install nvidia-driver-535
+sudo reboot
+
+# Install NVIDIA Container Toolkit
+distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
+curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | sudo apt-key add -
+curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
+  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
+sudo apt-get update
+sudo apt-get install -y nvidia-container-toolkit
+sudo systemctl restart docker
+```
+
+### Problem: Vision Encoder using CPU instead of GPU
+
+**Check device:**
+```bash
+curl http://localhost:8001/health | jq '.device'
+```
+
+If returns `"cpu"`:
+1. Check GPU runtime: `docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi`
+2. Restart Vision Encoder: `docker-compose restart vision-encoder`
+3. Check logs: `docker-compose logs vision-encoder`
+
+### Problem: Out of Memory
+
+**Check GPU memory:**
+```bash
+nvidia-smi
+```
+
+**Solutions:**
+1. Use smaller model: Edit `docker-compose.yml` → `MODEL_NAME=ViT-B-32` (2 GB instead of 4 GB)
+2. Stop Ollama temporarily: `docker stop ollama`
+3. Restart services: `docker-compose restart vision-encoder`
+
+---
+
+## 📖 Documentation
+
+- **[SYSTEM-INVENTORY.md](./SYSTEM-INVENTORY.md)** — Complete system inventory (GPU, models, services)
+- **[VISION-ENCODER-STATUS.md](./VISION-ENCODER-STATUS.md)** — Vision Encoder service status
+- **[VISION-RAG-IMPLEMENTATION.md](./VISION-RAG-IMPLEMENTATION.md)** — Implementation details
+- **[services/vision-encoder/README.md](./services/vision-encoder/README.md)** — Full deployment guide
+- **[docs/cursor/vision_encoder_deployment_task.md](./docs/cursor/vision_encoder_deployment_task.md)** — Deployment checklist
+
+---
+
+## ✅ Deployment Checklist
+
+**Before Deployment:**
+- [x] Code committed to Git
+- [x] Code pushed to GitHub
+- [x] Documentation updated
+- [x] Tests created
+- [x] Deploy script created
+
+**After Deployment:**
+- [ ] Vision Encoder running (port 8001)
+- [ ] Qdrant running (port 6333)
+- [ ] Health checks passing
+- [ ] Smoke tests passing
+- [ ] GPU detected and used (~4 GB VRAM)
+- [ ] Qdrant collection created
+- [ ] Integration with Router working
+
+---
+
+## 🎯 Next Steps After Deployment
+
+### 1. Index Existing Images
+
+```bash
+# Example: Index images from Parser Service output
+python scripts/index_images.py --dao-id daarion --directory /data/images
+```
+
+### 2. Test Image Search
+
+```bash
+# Text-to-image search
+curl -X POST http://localhost:9102/route \
+  -H "Content-Type: application/json" \
+  -d '{
+    "mode": "image_search",
+    "message": "знайди діаграми токеноміки",
+    "dao_id": "daarion",
+    "payload": {"top_k": 5}
+  }'
+```
+
+### 3. Monitor Performance
+
+```bash
+# GPU usage
+watch -n 1 nvidia-smi
+
+# Service logs
+docker-compose logs -f vision-encoder
+
+# Request metrics
+curl http://localhost:9090/metrics | grep vision_encoder
+```
+
+---
+
+**Status:** ✅ Ready to Deploy  
+**Last Updated:** 2025-01-17  
+**Maintained by:** Ivan Tytar & DAARION Team