# πŸ–₯️ System Inventory β€” DAARION & MicroDAO **Version:** 1.0.0 **Last Updated:** 2025-01-17 **Server:** GEX44 #2844465 (Hetzner) --- ## πŸ–₯️ Hardware Specifications ### Production Server (144.76.224.179) **Provider:** Hetzner Dedicated Server GEX44 **Server ID:** #2844465 #### GPU Configuration **GPU Model:** NVIDIA RTX 4000 SFF Ada Generation **VRAM:** 20 GB GDDR6 **Architecture:** Ada Lovelace **CUDA Version:** 12.2 **Driver Version:** 535.274.02 **Current VRAM Usage:** - Ollama (qwen3:8b): ~5.6 GB - Vision Encoder (ViT-L/14): ~1.9 GB - **Total:** ~7.5 GB / 20 GB (37.5% usage) #### CPU & RAM (Typical GEX44) - **CPU:** AMD Ryzen 9 5950X (16 cores, 32 threads) or similar - **RAM:** 128 GB DDR4 - **Storage:** 2x NVMe SSD (RAID configuration) --- ## πŸ€– Installed AI Models ### 1. LLM Models (Language Models) #### Ollama (Local) **Service:** Ollama **Port:** 11434 **Status:** βœ… Active **Installed Models:** | Model | Size | Parameters | Context | VRAM Usage | Purpose | |-------|------|-----------|---------|------------|---------| | **qwen3:8b** | ~4.7 GB | 8B | 32K | ~6 GB | Primary LLM for Router, fast inference | **API:** ```bash # List models curl http://localhost:11434/api/tags # Generate curl http://localhost:11434/api/generate -d '{ "model": "qwen3:8b", "prompt": "Hello" }' ``` **Configuration:** - Base URL: `http://172.17.0.1:11434` (from Docker containers) - Used by: DAGI Router, DevTools, CrewAI, Gateway --- ### 2. Vision Models (Multimodal) #### OpenCLIP (Vision Encoder Service) **Service:** vision-encoder **Port:** 8001 **Status:** βœ… Active (GPU-accelerated) **Model Details:** | Model | Architecture | Parameters | Embedding Dim | VRAM Usage | Purpose | |-------|-------------|-----------|---------------|------------|---------| | **ViT-L/14** | Vision Transformer Large | ~428M | 768 | ~4 GB | Text/Image embeddings for RAG | | **OpenAI CLIP** | CLIP (Contrastive Language-Image Pre-training) | - | 768 | - | Pretrained weights | **Capabilities:** - βœ… Text β†’ 768-dim embedding (0.1-0.5s on GPU, ~10-15s on CPU) - βœ… Image β†’ 768-dim embedding (0.3-1s on GPU, ~15-20s on CPU) - βœ… Text-to-image search (via Qdrant) - βœ… Image-to-image similarity search (via Qdrant) - βœ… GPU acceleration: **~20-30x speedup** vs CPU - ⏳ Zero-shot image classification (planned) - ⏳ CLIP score calculation (planned) **API Endpoints:** ```bash # Text embedding POST http://localhost:8001/embed/text # Image embedding (URL) POST http://localhost:8001/embed/image # Image embedding (file upload) POST http://localhost:8001/embed/image/upload # Health check GET http://localhost:8001/health # Model info GET http://localhost:8001/info ``` **Configuration:** - Model: `ViT-L-14` - Pretrained: `openai` - Device: `cuda` (GPU) - Normalize: `true` - Integration: DAGI Router (mode: `vision_embed`, `image_search`) --- ### 3. Embedding Models (Text) #### BAAI/bge-m3 (RAG Service) **Service:** rag-service **Port:** 9500 **Status:** βœ… Active **Model Details:** | Model | Type | Embedding Dim | Context Length | Device | Purpose | |-------|------|---------------|----------------|--------|---------| | **BAAI/bge-m3** | Dense Retrieval | 1024 | 8192 | CPU/GPU | Text embeddings for RAG | **Capabilities:** - βœ… Document embedding for retrieval - βœ… Query embedding - βœ… Multi-lingual support - βœ… Long context (8192 tokens) **Storage:** - Vector database: PostgreSQL with pgvector extension - Indexed documents: Chat messages, tasks, meetings, governance docs **Configuration:** - Model: `BAAI/bge-m3` - Device: `cpu` (can use GPU if available) - HuggingFace cache: `/root/.cache/huggingface` --- ### 4. Audio Models **Status:** ❌ Not installed yet **Planned:** - Whisper (speech-to-text) - TTS models (text-to-speech) - Audio classification --- ## πŸ—„οΈ Vector Databases ### 1. Qdrant (Image Embeddings) **Service:** qdrant **Port:** 6333 (HTTP), 6334 (gRPC) **Status:** βœ… Active **Collections:** | Collection | Vectors | Dimension | Distance | Purpose | |-----------|---------|-----------|----------|---------| | **daarion_images** | Variable | 768 | Cosine | Image search (textβ†’image, imageβ†’image) | **Storage:** Docker volume `qdrant-data` **API:** ```bash # Health check curl http://localhost:6333/healthz # List collections curl http://localhost:6333/collections # Collection info curl http://localhost:6333/collections/daarion_images ``` --- ### 2. PostgreSQL + pgvector (Text Embeddings) **Service:** dagi-postgres **Port:** 5432 **Status:** βœ… Active **Databases:** | Database | Extension | Purpose | |----------|-----------|---------| | **daarion_memory** | - | Agent memory, context | | **daarion_city** | pgvector | RAG document storage (1024-dim) | **Storage:** Docker volume `postgres-data` --- ### 3. Neo4j (Graph Memory) **Service:** neo4j **Port:** 7687 (Bolt), 7474 (HTTP) **Status:** βœ… Active (optional) **Purpose:** - Knowledge graph for entities - Agent relationships - DAO structure mapping **Storage:** Docker volume (if configured) --- ## πŸ› οΈ AI Services ### 1. DAGI Router (9102) **Purpose:** Main routing engine for AI requests **LLM Integration:** - Ollama (qwen3:8b) - DeepSeek (optional, API key required) - OpenAI (optional, API key required) **Providers:** - LLM Provider (Ollama, DeepSeek, OpenAI) - Vision Encoder Provider (OpenCLIP) - DevTools Provider - CrewAI Provider - Vision RAG Provider (image search) --- ### 2. RAG Service (9500) **Purpose:** Document retrieval and Q&A **Models:** - Embeddings: BAAI/bge-m3 (1024-dim) - LLM: via DAGI Router (qwen3:8b) **Capabilities:** - Document ingestion (chat, tasks, meetings, governance, RWA, oracle) - Vector search (pgvector) - Q&A generation - Context ranking --- ### 3. Vision Encoder (8001) **Purpose:** Text/Image embeddings for multimodal RAG **Models:** - OpenCLIP ViT-L/14 (768-dim) **Capabilities:** - Text embeddings - Image embeddings - Image search (text-to-image, image-to-image) --- ### 4. Parser Service (9400) **Purpose:** Document parsing and processing **Capabilities:** - PDF parsing - Image extraction - OCR (via Crawl4AI) - Q&A generation **Integration:** - Crawl4AI for web content - Vision Encoder for image analysis (planned) --- ### 5. Memory Service (8000) **Purpose:** Agent memory and context management **Storage:** - PostgreSQL (daarion_memory) - Redis (short-term cache, optional) - Neo4j (graph memory, optional) --- ### 6. CrewAI Orchestrator (9010) **Purpose:** Multi-agent workflow execution **LLM:** via DAGI Router (qwen3:8b) **Workflows:** - microDAO onboarding - Code review - Proposal review - Task decomposition --- ### 7. DevTools Backend (8008) **Purpose:** Development tool execution **Tools:** - File operations (read/write) - Test execution - Notebook execution - Git operations (planned) --- ### 8. Bot Gateway (9300) **Purpose:** Telegram/Discord bot integration **Bots:** - DAARWIZZ (Telegram) - Helion (Telegram, Energy Union) --- ### 9. RBAC Service (9200) **Purpose:** Role-based access control **Storage:** SQLite (`rbac.db`) --- ## πŸ“Š GPU Memory Allocation (Estimated) **Total VRAM:** 24 GB | Service | Model | VRAM Usage | Status | |---------|-------|-----------|--------| | **Vision Encoder** | OpenCLIP ViT-L/14 | ~4 GB | Always loaded | | **Ollama** | qwen3:8b | ~6 GB | Loaded on demand | | **Available** | - | ~14 GB | For other models | **Note:** - Ollama and Vision Encoder can run simultaneously (~10 GB total) - Remaining 14 GB available for additional models (audio, larger LLMs, etc.) --- ## πŸ”„ Model Loading Strategy ### Vision Encoder (Always-On) - **Preloaded:** Yes (on service startup) - **Reason:** Fast inference for image search - **Unload:** Never (unless service restart) ### Ollama qwen3:8b (On-Demand) - **Preloaded:** No - **Load Time:** 2-3 seconds (first request) - **Keep Alive:** 5 minutes (default) - **Unload:** After idle timeout ### Future Models (Planned) - **Whisper:** Load on-demand for audio transcription - **TTS:** Load on-demand for speech synthesis - **Larger LLMs:** Load on-demand (if VRAM available) --- ## πŸ“ˆ Performance Benchmarks ### LLM Inference (qwen3:8b) - **Tokens/sec:** ~50-80 tokens/sec (GPU) - **Latency:** 100-200ms (first token) - **Context:** 32K tokens - **Batch size:** 1 (default) ### Vision Inference (ViT-L/14) - **Text embedding:** 10-20ms (GPU) - **Image embedding:** 30-50ms (GPU) - **Throughput:** 50-100 images/sec (batch) ### RAG Search (BAAI/bge-m3) - **Query embedding:** 50-100ms (CPU) - **Vector search:** 5-10ms (pgvector) - **Total latency:** 60-120ms --- ## πŸ”§ Model Management ### Ollama Models **List installed models:** ```bash curl http://localhost:11434/api/tags ``` **Pull new model:** ```bash ollama pull llama2:7b ollama pull mistral:7b ``` **Remove model:** ```bash ollama rm qwen3:8b ``` **Check model info:** ```bash ollama show qwen3:8b ``` --- ### Vision Encoder Models **Change model (in docker-compose.yml):** ```yaml environment: - MODEL_NAME=ViT-B-32 # Smaller, faster - MODEL_PRETRAINED=openai ``` **Available models:** - `ViT-B-32` (512-dim, 2 GB VRAM) - `ViT-L-14` (768-dim, 4 GB VRAM) ← Current - `ViT-L-14@336` (768-dim, 6 GB VRAM, higher resolution) - `ViT-H-14` (1024-dim, 8 GB VRAM, highest quality) --- ## πŸ“‹ Complete Service List (17 Services) | # | Service | Port | GPU | Models/Tools | Status | |---|---------|------|-----|-------------|--------| | 1 | DAGI Router | 9102 | ❌ | Routing engine | βœ… | | 2 | Bot Gateway | 9300 | ❌ | Telegram bots | βœ… | | 3 | DevTools | 8008 | ❌ | File ops, tests | βœ… | | 4 | CrewAI | 9010 | ❌ | Multi-agent | βœ… | | 5 | RBAC | 9200 | ❌ | Access control | βœ… | | 6 | RAG Service | 9500 | ❌ | BAAI/bge-m3 | βœ… | | 7 | Memory Service | 8000 | ❌ | Context mgmt | βœ… | | 8 | Parser Service | 9400 | ❌ | PDF, OCR | βœ… | | 9 | **Vision Encoder** | 8001 | βœ… | **OpenCLIP ViT-L/14** | βœ… | | 10 | PostgreSQL | 5432 | ❌ | pgvector | βœ… | | 11 | Redis | 6379 | ❌ | Cache | βœ… | | 12 | Neo4j | 7687 | ❌ | Graph DB | βœ… | | 13 | **Qdrant** | 6333 | ❌ | Vector DB | βœ… | | 14 | Grafana | 3000 | ❌ | Dashboards | βœ… | | 15 | Prometheus | 9090 | ❌ | Metrics | βœ… | | 16 | Neo4j Exporter | 9091 | ❌ | Metrics | βœ… | | 17 | **Ollama** | 11434 | βœ… | **qwen3:8b** | βœ… | **GPU Services:** 2 (Vision Encoder, Ollama) **Total VRAM Usage:** ~10 GB (concurrent) --- ## πŸš€ Deployment Checklist ### Pre-Deployment (Local) - [x] Code reviewed and tested - [x] Documentation updated (WARP.md, INFRASTRUCTURE.md) - [x] Jupyter Notebook updated - [x] All tests passing - [x] Git committed and pushed ### Deployment (Server) ```bash # 1. SSH to server ssh root@144.76.224.179 # 2. Pull latest code cd /opt/microdao-daarion git pull origin main # 3. Check GPU nvidia-smi # 4. Build new services docker-compose build vision-encoder # 5. Start all services docker-compose up -d # 6. Verify health docker-compose ps curl http://localhost:8001/health # Vision Encoder curl http://localhost:6333/healthz # Qdrant curl http://localhost:9102/health # Router # 7. Run smoke tests ./smoke.sh ./test-vision-encoder.sh # 8. Check logs docker-compose logs -f vision-encoder docker-compose logs -f router # 9. Monitor GPU watch -n 1 nvidia-smi ``` --- ## πŸ“– Documentation Index - **[WARP.md](./WARP.md)** β€” Developer guide (quick start for Warp AI) - **[INFRASTRUCTURE.md](./INFRASTRUCTURE.md)** β€” Server, services, deployment - **[VISION-ENCODER-STATUS.md](./VISION-ENCODER-STATUS.md)** β€” Vision Encoder status - **[VISION-RAG-IMPLEMENTATION.md](./VISION-RAG-IMPLEMENTATION.md)** β€” Vision RAG complete implementation - **[docs/cursor/vision_encoder_deployment_task.md](./docs/cursor/vision_encoder_deployment_task.md)** β€” Deployment task - **[docs/infrastructure_quick_ref.ipynb](./docs/infrastructure_quick_ref.ipynb)** β€” Jupyter quick reference --- ## 🎯 Next Steps ### Phase 1: Audio Integration - [ ] Install Whisper (speech-to-text) - [ ] Install TTS model (text-to-speech) - [ ] Integrate with Telegram voice messages - [ ] Audio RAG (transcription + search) ### Phase 2: Larger LLMs - [ ] Install Mistral 7B (better reasoning) - [ ] Install Llama 2 70B (if enough VRAM via quantization) - [ ] Multi-model routing (task-specific models) ### Phase 3: Advanced Vision - [ ] Image captioning (BLIP-2) - [ ] Zero-shot classification - [ ] Video understanding (frame extraction + CLIP) ### Phase 4: Optimization - [ ] Model quantization (reduce VRAM) - [ ] Batch inference (increase throughput) - [ ] Model caching (Redis) - [ ] GPU sharing (multiple models concurrently) --- **Last Updated:** 2025-01-17 **Maintained by:** Ivan Tytar & DAARION Team **Status:** βœ… Production Ready (17 services, 3 AI models)