13 KiB
🖥️ System Inventory — DAARION & MicroDAO
Version: 1.0.0
Last Updated: 2025-01-17
Server: GEX44 #2844465 (Hetzner)
🖥️ Hardware Specifications
Production Server (144.76.224.179)
Provider: Hetzner Dedicated Server GEX44
Server ID: #2844465
GPU Configuration
GPU Model: NVIDIA RTX 4000 SFF Ada Generation
VRAM: 20 GB GDDR6
Architecture: Ada Lovelace
CUDA Version: 12.2
Driver Version: 535.274.02
Current VRAM Usage:
- Ollama (qwen3:8b): ~5.6 GB
- Vision Encoder (ViT-L/14): ~1.9 GB
- Total: ~7.5 GB / 20 GB (37.5% usage)
CPU & RAM (Typical GEX44)
- CPU: AMD Ryzen 9 5950X (16 cores, 32 threads) or similar
- RAM: 128 GB DDR4
- Storage: 2x NVMe SSD (RAID configuration)
🤖 Installed AI Models
1. LLM Models (Language Models)
Ollama (Local)
Service: Ollama
Port: 11434
Status: ✅ Active
Installed Models:
| Model | Size | Parameters | Context | VRAM Usage | Purpose |
|---|---|---|---|---|---|
| qwen3:8b | ~4.7 GB | 8B | 32K | ~6 GB | Primary LLM for Router, fast inference |
API:
# List models
curl http://localhost:11434/api/tags
# Generate
curl http://localhost:11434/api/generate -d '{
"model": "qwen3:8b",
"prompt": "Hello"
}'
Configuration:
- Base URL:
http://172.17.0.1:11434(from Docker containers) - Used by: DAGI Router, DevTools, CrewAI, Gateway
2. Vision Models (Multimodal)
OpenCLIP (Vision Encoder Service)
Service: vision-encoder
Port: 8001
Status: ✅ Active (GPU-accelerated)
Model Details:
| Model | Architecture | Parameters | Embedding Dim | VRAM Usage | Purpose |
|---|---|---|---|---|---|
| ViT-L/14 | Vision Transformer Large | ~428M | 768 | ~4 GB | Text/Image embeddings for RAG |
| OpenAI CLIP | CLIP (Contrastive Language-Image Pre-training) | - | 768 | - | Pretrained weights |
Capabilities:
- ✅ Text → 768-dim embedding (0.1-0.5s on GPU, ~10-15s on CPU)
- ✅ Image → 768-dim embedding (0.3-1s on GPU, ~15-20s on CPU)
- ✅ Text-to-image search (via Qdrant)
- ✅ Image-to-image similarity search (via Qdrant)
- ✅ GPU acceleration: ~20-30x speedup vs CPU
- ⏳ Zero-shot image classification (planned)
- ⏳ CLIP score calculation (planned)
API Endpoints:
# Text embedding
POST http://localhost:8001/embed/text
# Image embedding (URL)
POST http://localhost:8001/embed/image
# Image embedding (file upload)
POST http://localhost:8001/embed/image/upload
# Health check
GET http://localhost:8001/health
# Model info
GET http://localhost:8001/info
Configuration:
- Model:
ViT-L-14 - Pretrained:
openai - Device:
cuda(GPU) - Normalize:
true - Integration: DAGI Router (mode:
vision_embed,image_search)
3. Embedding Models (Text)
BAAI/bge-m3 (RAG Service)
Service: rag-service
Port: 9500
Status: ✅ Active
Model Details:
| Model | Type | Embedding Dim | Context Length | Device | Purpose |
|---|---|---|---|---|---|
| BAAI/bge-m3 | Dense Retrieval | 1024 | 8192 | CPU/GPU | Text embeddings for RAG |
Capabilities:
- ✅ Document embedding for retrieval
- ✅ Query embedding
- ✅ Multi-lingual support
- ✅ Long context (8192 tokens)
Storage:
- Vector database: PostgreSQL with pgvector extension
- Indexed documents: Chat messages, tasks, meetings, governance docs
Configuration:
- Model:
BAAI/bge-m3 - Device:
cpu(can use GPU if available) - HuggingFace cache:
/root/.cache/huggingface
4. Audio Models
Status: ❌ Not installed yet
Planned:
- Whisper (speech-to-text)
- TTS models (text-to-speech)
- Audio classification
🗄️ Vector Databases
1. Qdrant (Image Embeddings)
Service: qdrant
Port: 6333 (HTTP), 6334 (gRPC)
Status: ✅ Active
Collections:
| Collection | Vectors | Dimension | Distance | Purpose |
|---|---|---|---|---|
| daarion_images | Variable | 768 | Cosine | Image search (text→image, image→image) |
Storage: Docker volume qdrant-data
API:
# Health check
curl http://localhost:6333/healthz
# List collections
curl http://localhost:6333/collections
# Collection info
curl http://localhost:6333/collections/daarion_images
2. PostgreSQL + pgvector (Text Embeddings)
Service: dagi-postgres
Port: 5432
Status: ✅ Active
Databases:
| Database | Extension | Purpose |
|---|---|---|
| daarion_memory | - | Agent memory, context |
| daarion_city | pgvector | RAG document storage (1024-dim) |
Storage: Docker volume postgres-data
3. Neo4j (Graph Memory)
Service: neo4j
Port: 7687 (Bolt), 7474 (HTTP)
Status: ✅ Active (optional)
Purpose:
- Knowledge graph for entities
- Agent relationships
- DAO structure mapping
Storage: Docker volume (if configured)
🛠️ AI Services
1. DAGI Router (9102)
Purpose: Main routing engine for AI requests
LLM Integration:
- Ollama (qwen3:8b)
- DeepSeek (optional, API key required)
- OpenAI (optional, API key required)
Providers:
- LLM Provider (Ollama, DeepSeek, OpenAI)
- Vision Encoder Provider (OpenCLIP)
- DevTools Provider
- CrewAI Provider
- Vision RAG Provider (image search)
2. RAG Service (9500)
Purpose: Document retrieval and Q&A
Models:
- Embeddings: BAAI/bge-m3 (1024-dim)
- LLM: via DAGI Router (qwen3:8b)
Capabilities:
- Document ingestion (chat, tasks, meetings, governance, RWA, oracle)
- Vector search (pgvector)
- Q&A generation
- Context ranking
3. Vision Encoder (8001)
Purpose: Text/Image embeddings for multimodal RAG
Models:
- OpenCLIP ViT-L/14 (768-dim)
Capabilities:
- Text embeddings
- Image embeddings
- Image search (text-to-image, image-to-image)
4. Parser Service (9400)
Purpose: Document parsing and processing
Capabilities:
- PDF parsing
- Image extraction
- OCR (via Crawl4AI)
- Q&A generation
Integration:
- Crawl4AI for web content
- Vision Encoder for image analysis (planned)
5. Memory Service (8000)
Purpose: Agent memory and context management
Storage:
- PostgreSQL (daarion_memory)
- Redis (short-term cache, optional)
- Neo4j (graph memory, optional)
6. CrewAI Orchestrator (9010)
Purpose: Multi-agent workflow execution
LLM: via DAGI Router (qwen3:8b)
Workflows:
- microDAO onboarding
- Code review
- Proposal review
- Task decomposition
7. DevTools Backend (8008)
Purpose: Development tool execution
Tools:
- File operations (read/write)
- Test execution
- Notebook execution
- Git operations (planned)
8. Bot Gateway (9300)
Purpose: Telegram/Discord bot integration
Bots:
- DAARWIZZ (Telegram)
- Helion (Telegram, Energy Union)
9. RBAC Service (9200)
Purpose: Role-based access control
Storage: SQLite (rbac.db)
📊 GPU Memory Allocation (Estimated)
Total VRAM: 24 GB
| Service | Model | VRAM Usage | Status |
|---|---|---|---|
| Vision Encoder | OpenCLIP ViT-L/14 | ~4 GB | Always loaded |
| Ollama | qwen3:8b | ~6 GB | Loaded on demand |
| Available | - | ~14 GB | For other models |
Note:
- Ollama and Vision Encoder can run simultaneously (~10 GB total)
- Remaining 14 GB available for additional models (audio, larger LLMs, etc.)
🔄 Model Loading Strategy
Vision Encoder (Always-On)
- Preloaded: Yes (on service startup)
- Reason: Fast inference for image search
- Unload: Never (unless service restart)
Ollama qwen3:8b (On-Demand)
- Preloaded: No
- Load Time: 2-3 seconds (first request)
- Keep Alive: 5 minutes (default)
- Unload: After idle timeout
Future Models (Planned)
- Whisper: Load on-demand for audio transcription
- TTS: Load on-demand for speech synthesis
- Larger LLMs: Load on-demand (if VRAM available)
📈 Performance Benchmarks
LLM Inference (qwen3:8b)
- Tokens/sec: ~50-80 tokens/sec (GPU)
- Latency: 100-200ms (first token)
- Context: 32K tokens
- Batch size: 1 (default)
Vision Inference (ViT-L/14)
- Text embedding: 10-20ms (GPU)
- Image embedding: 30-50ms (GPU)
- Throughput: 50-100 images/sec (batch)
RAG Search (BAAI/bge-m3)
- Query embedding: 50-100ms (CPU)
- Vector search: 5-10ms (pgvector)
- Total latency: 60-120ms
🔧 Model Management
Ollama Models
List installed models:
curl http://localhost:11434/api/tags
Pull new model:
ollama pull llama2:7b
ollama pull mistral:7b
Remove model:
ollama rm qwen3:8b
Check model info:
ollama show qwen3:8b
Vision Encoder Models
Change model (in docker-compose.yml):
environment:
- MODEL_NAME=ViT-B-32 # Smaller, faster
- MODEL_PRETRAINED=openai
Available models:
ViT-B-32(512-dim, 2 GB VRAM)ViT-L-14(768-dim, 4 GB VRAM) ← CurrentViT-L-14@336(768-dim, 6 GB VRAM, higher resolution)ViT-H-14(1024-dim, 8 GB VRAM, highest quality)
📋 Complete Service List (17 Services)
| # | Service | Port | GPU | Models/Tools | Status |
|---|---|---|---|---|---|
| 1 | DAGI Router | 9102 | ❌ | Routing engine | ✅ |
| 2 | Bot Gateway | 9300 | ❌ | Telegram bots | ✅ |
| 3 | DevTools | 8008 | ❌ | File ops, tests | ✅ |
| 4 | CrewAI | 9010 | ❌ | Multi-agent | ✅ |
| 5 | RBAC | 9200 | ❌ | Access control | ✅ |
| 6 | RAG Service | 9500 | ❌ | BAAI/bge-m3 | ✅ |
| 7 | Memory Service | 8000 | ❌ | Context mgmt | ✅ |
| 8 | Parser Service | 9400 | ❌ | PDF, OCR | ✅ |
| 9 | Vision Encoder | 8001 | ✅ | OpenCLIP ViT-L/14 | ✅ |
| 10 | PostgreSQL | 5432 | ❌ | pgvector | ✅ |
| 11 | Redis | 6379 | ❌ | Cache | ✅ |
| 12 | Neo4j | 7687 | ❌ | Graph DB | ✅ |
| 13 | Qdrant | 6333 | ❌ | Vector DB | ✅ |
| 14 | Grafana | 3000 | ❌ | Dashboards | ✅ |
| 15 | Prometheus | 9090 | ❌ | Metrics | ✅ |
| 16 | Neo4j Exporter | 9091 | ❌ | Metrics | ✅ |
| 17 | Ollama | 11434 | ✅ | qwen3:8b | ✅ |
GPU Services: 2 (Vision Encoder, Ollama)
Total VRAM Usage: ~10 GB (concurrent)
🚀 Deployment Checklist
Pre-Deployment (Local)
- Code reviewed and tested
- Documentation updated (WARP.md, INFRASTRUCTURE.md)
- Jupyter Notebook updated
- All tests passing
- Git committed and pushed
Deployment (Server)
# 1. SSH to server
ssh root@144.76.224.179
# 2. Pull latest code
cd /opt/microdao-daarion
git pull origin main
# 3. Check GPU
nvidia-smi
# 4. Build new services
docker-compose build vision-encoder
# 5. Start all services
docker-compose up -d
# 6. Verify health
docker-compose ps
curl http://localhost:8001/health # Vision Encoder
curl http://localhost:6333/healthz # Qdrant
curl http://localhost:9102/health # Router
# 7. Run smoke tests
./smoke.sh
./test-vision-encoder.sh
# 8. Check logs
docker-compose logs -f vision-encoder
docker-compose logs -f router
# 9. Monitor GPU
watch -n 1 nvidia-smi
📖 Documentation Index
- WARP.md — Developer guide (quick start for Warp AI)
- INFRASTRUCTURE.md — Server, services, deployment
- VISION-ENCODER-STATUS.md — Vision Encoder status
- VISION-RAG-IMPLEMENTATION.md — Vision RAG complete implementation
- docs/cursor/vision_encoder_deployment_task.md — Deployment task
- docs/infrastructure_quick_ref.ipynb — Jupyter quick reference
🎯 Next Steps
Phase 1: Audio Integration
- Install Whisper (speech-to-text)
- Install TTS model (text-to-speech)
- Integrate with Telegram voice messages
- Audio RAG (transcription + search)
Phase 2: Larger LLMs
- Install Mistral 7B (better reasoning)
- Install Llama 2 70B (if enough VRAM via quantization)
- Multi-model routing (task-specific models)
Phase 3: Advanced Vision
- Image captioning (BLIP-2)
- Zero-shot classification
- Video understanding (frame extraction + CLIP)
Phase 4: Optimization
- Model quantization (reduce VRAM)
- Batch inference (increase throughput)
- Model caching (Redis)
- GPU sharing (multiple models concurrently)
Last Updated: 2025-01-17
Maintained by: Ivan Tytar & DAARION Team
Status: ✅ Production Ready (17 services, 3 AI models)