Files
microdao-daarion/SYSTEM-INVENTORY.md

13 KiB

🖥️ System Inventory — DAARION & MicroDAO

Version: 1.0.0
Last Updated: 2025-01-17
Server: GEX44 #2844465 (Hetzner)


🖥️ Hardware Specifications

Production Server (144.76.224.179)

Provider: Hetzner Dedicated Server GEX44
Server ID: #2844465

GPU Configuration

GPU Model: NVIDIA RTX 4000 SFF Ada Generation
VRAM: 20 GB GDDR6
Architecture: Ada Lovelace
CUDA Version: 12.2
Driver Version: 535.274.02

Current VRAM Usage:

  • Ollama (qwen3:8b): ~5.6 GB
  • Vision Encoder (ViT-L/14): ~1.9 GB
  • Total: ~7.5 GB / 20 GB (37.5% usage)

CPU & RAM (Typical GEX44)

  • CPU: AMD Ryzen 9 5950X (16 cores, 32 threads) or similar
  • RAM: 128 GB DDR4
  • Storage: 2x NVMe SSD (RAID configuration)

🤖 Installed AI Models

1. LLM Models (Language Models)

Ollama (Local)

Service: Ollama
Port: 11434
Status: Active

Installed Models:

Model Size Parameters Context VRAM Usage Purpose
qwen3:8b ~4.7 GB 8B 32K ~6 GB Primary LLM for Router, fast inference

API:

# List models
curl http://localhost:11434/api/tags

# Generate
curl http://localhost:11434/api/generate -d '{
  "model": "qwen3:8b",
  "prompt": "Hello"
}'

Configuration:

  • Base URL: http://172.17.0.1:11434 (from Docker containers)
  • Used by: DAGI Router, DevTools, CrewAI, Gateway

2. Vision Models (Multimodal)

OpenCLIP (Vision Encoder Service)

Service: vision-encoder
Port: 8001
Status: Active (GPU-accelerated)

Model Details:

Model Architecture Parameters Embedding Dim VRAM Usage Purpose
ViT-L/14 Vision Transformer Large ~428M 768 ~4 GB Text/Image embeddings for RAG
OpenAI CLIP CLIP (Contrastive Language-Image Pre-training) - 768 - Pretrained weights

Capabilities:

  • Text → 768-dim embedding (0.1-0.5s on GPU, ~10-15s on CPU)
  • Image → 768-dim embedding (0.3-1s on GPU, ~15-20s on CPU)
  • Text-to-image search (via Qdrant)
  • Image-to-image similarity search (via Qdrant)
  • GPU acceleration: ~20-30x speedup vs CPU
  • Zero-shot image classification (planned)
  • CLIP score calculation (planned)

API Endpoints:

# Text embedding
POST http://localhost:8001/embed/text

# Image embedding (URL)
POST http://localhost:8001/embed/image

# Image embedding (file upload)
POST http://localhost:8001/embed/image/upload

# Health check
GET http://localhost:8001/health

# Model info
GET http://localhost:8001/info

Configuration:

  • Model: ViT-L-14
  • Pretrained: openai
  • Device: cuda (GPU)
  • Normalize: true
  • Integration: DAGI Router (mode: vision_embed, image_search)

3. Embedding Models (Text)

BAAI/bge-m3 (RAG Service)

Service: rag-service
Port: 9500
Status: Active

Model Details:

Model Type Embedding Dim Context Length Device Purpose
BAAI/bge-m3 Dense Retrieval 1024 8192 CPU/GPU Text embeddings for RAG

Capabilities:

  • Document embedding for retrieval
  • Query embedding
  • Multi-lingual support
  • Long context (8192 tokens)

Storage:

  • Vector database: PostgreSQL with pgvector extension
  • Indexed documents: Chat messages, tasks, meetings, governance docs

Configuration:

  • Model: BAAI/bge-m3
  • Device: cpu (can use GPU if available)
  • HuggingFace cache: /root/.cache/huggingface

4. Audio Models

Status: Not installed yet

Planned:

  • Whisper (speech-to-text)
  • TTS models (text-to-speech)
  • Audio classification

🗄️ Vector Databases

1. Qdrant (Image Embeddings)

Service: qdrant
Port: 6333 (HTTP), 6334 (gRPC)
Status: Active

Collections:

Collection Vectors Dimension Distance Purpose
daarion_images Variable 768 Cosine Image search (text→image, image→image)

Storage: Docker volume qdrant-data

API:

# Health check
curl http://localhost:6333/healthz

# List collections
curl http://localhost:6333/collections

# Collection info
curl http://localhost:6333/collections/daarion_images

2. PostgreSQL + pgvector (Text Embeddings)

Service: dagi-postgres
Port: 5432
Status: Active

Databases:

Database Extension Purpose
daarion_memory - Agent memory, context
daarion_city pgvector RAG document storage (1024-dim)

Storage: Docker volume postgres-data


3. Neo4j (Graph Memory)

Service: neo4j
Port: 7687 (Bolt), 7474 (HTTP)
Status: Active (optional)

Purpose:

  • Knowledge graph for entities
  • Agent relationships
  • DAO structure mapping

Storage: Docker volume (if configured)


🛠️ AI Services

1. DAGI Router (9102)

Purpose: Main routing engine for AI requests
LLM Integration:

  • Ollama (qwen3:8b)
  • DeepSeek (optional, API key required)
  • OpenAI (optional, API key required)

Providers:

  • LLM Provider (Ollama, DeepSeek, OpenAI)
  • Vision Encoder Provider (OpenCLIP)
  • DevTools Provider
  • CrewAI Provider
  • Vision RAG Provider (image search)

2. RAG Service (9500)

Purpose: Document retrieval and Q&A
Models:

  • Embeddings: BAAI/bge-m3 (1024-dim)
  • LLM: via DAGI Router (qwen3:8b)

Capabilities:

  • Document ingestion (chat, tasks, meetings, governance, RWA, oracle)
  • Vector search (pgvector)
  • Q&A generation
  • Context ranking

3. Vision Encoder (8001)

Purpose: Text/Image embeddings for multimodal RAG
Models:

  • OpenCLIP ViT-L/14 (768-dim)

Capabilities:

  • Text embeddings
  • Image embeddings
  • Image search (text-to-image, image-to-image)

4. Parser Service (9400)

Purpose: Document parsing and processing
Capabilities:

  • PDF parsing
  • Image extraction
  • OCR (via Crawl4AI)
  • Q&A generation

Integration:

  • Crawl4AI for web content
  • Vision Encoder for image analysis (planned)

5. Memory Service (8000)

Purpose: Agent memory and context management
Storage:

  • PostgreSQL (daarion_memory)
  • Redis (short-term cache, optional)
  • Neo4j (graph memory, optional)

6. CrewAI Orchestrator (9010)

Purpose: Multi-agent workflow execution
LLM: via DAGI Router (qwen3:8b)

Workflows:

  • microDAO onboarding
  • Code review
  • Proposal review
  • Task decomposition

7. DevTools Backend (8008)

Purpose: Development tool execution
Tools:

  • File operations (read/write)
  • Test execution
  • Notebook execution
  • Git operations (planned)

8. Bot Gateway (9300)

Purpose: Telegram/Discord bot integration
Bots:

  • DAARWIZZ (Telegram)
  • Helion (Telegram, Energy Union)

9. RBAC Service (9200)

Purpose: Role-based access control
Storage: SQLite (rbac.db)


📊 GPU Memory Allocation (Estimated)

Total VRAM: 24 GB

Service Model VRAM Usage Status
Vision Encoder OpenCLIP ViT-L/14 ~4 GB Always loaded
Ollama qwen3:8b ~6 GB Loaded on demand
Available - ~14 GB For other models

Note:

  • Ollama and Vision Encoder can run simultaneously (~10 GB total)
  • Remaining 14 GB available for additional models (audio, larger LLMs, etc.)

🔄 Model Loading Strategy

Vision Encoder (Always-On)

  • Preloaded: Yes (on service startup)
  • Reason: Fast inference for image search
  • Unload: Never (unless service restart)

Ollama qwen3:8b (On-Demand)

  • Preloaded: No
  • Load Time: 2-3 seconds (first request)
  • Keep Alive: 5 minutes (default)
  • Unload: After idle timeout

Future Models (Planned)

  • Whisper: Load on-demand for audio transcription
  • TTS: Load on-demand for speech synthesis
  • Larger LLMs: Load on-demand (if VRAM available)

📈 Performance Benchmarks

LLM Inference (qwen3:8b)

  • Tokens/sec: ~50-80 tokens/sec (GPU)
  • Latency: 100-200ms (first token)
  • Context: 32K tokens
  • Batch size: 1 (default)

Vision Inference (ViT-L/14)

  • Text embedding: 10-20ms (GPU)
  • Image embedding: 30-50ms (GPU)
  • Throughput: 50-100 images/sec (batch)

RAG Search (BAAI/bge-m3)

  • Query embedding: 50-100ms (CPU)
  • Vector search: 5-10ms (pgvector)
  • Total latency: 60-120ms

🔧 Model Management

Ollama Models

List installed models:

curl http://localhost:11434/api/tags

Pull new model:

ollama pull llama2:7b
ollama pull mistral:7b

Remove model:

ollama rm qwen3:8b

Check model info:

ollama show qwen3:8b

Vision Encoder Models

Change model (in docker-compose.yml):

environment:
  - MODEL_NAME=ViT-B-32  # Smaller, faster
  - MODEL_PRETRAINED=openai

Available models:

  • ViT-B-32 (512-dim, 2 GB VRAM)
  • ViT-L-14 (768-dim, 4 GB VRAM) ← Current
  • ViT-L-14@336 (768-dim, 6 GB VRAM, higher resolution)
  • ViT-H-14 (1024-dim, 8 GB VRAM, highest quality)

📋 Complete Service List (17 Services)

# Service Port GPU Models/Tools Status
1 DAGI Router 9102 Routing engine
2 Bot Gateway 9300 Telegram bots
3 DevTools 8008 File ops, tests
4 CrewAI 9010 Multi-agent
5 RBAC 9200 Access control
6 RAG Service 9500 BAAI/bge-m3
7 Memory Service 8000 Context mgmt
8 Parser Service 9400 PDF, OCR
9 Vision Encoder 8001 OpenCLIP ViT-L/14
10 PostgreSQL 5432 pgvector
11 Redis 6379 Cache
12 Neo4j 7687 Graph DB
13 Qdrant 6333 Vector DB
14 Grafana 3000 Dashboards
15 Prometheus 9090 Metrics
16 Neo4j Exporter 9091 Metrics
17 Ollama 11434 qwen3:8b

GPU Services: 2 (Vision Encoder, Ollama)
Total VRAM Usage: ~10 GB (concurrent)


🚀 Deployment Checklist

Pre-Deployment (Local)

  • Code reviewed and tested
  • Documentation updated (WARP.md, INFRASTRUCTURE.md)
  • Jupyter Notebook updated
  • All tests passing
  • Git committed and pushed

Deployment (Server)

# 1. SSH to server
ssh root@144.76.224.179

# 2. Pull latest code
cd /opt/microdao-daarion
git pull origin main

# 3. Check GPU
nvidia-smi

# 4. Build new services
docker-compose build vision-encoder

# 5. Start all services
docker-compose up -d

# 6. Verify health
docker-compose ps
curl http://localhost:8001/health  # Vision Encoder
curl http://localhost:6333/healthz # Qdrant
curl http://localhost:9102/health  # Router

# 7. Run smoke tests
./smoke.sh
./test-vision-encoder.sh

# 8. Check logs
docker-compose logs -f vision-encoder
docker-compose logs -f router

# 9. Monitor GPU
watch -n 1 nvidia-smi

📖 Documentation Index


🎯 Next Steps

Phase 1: Audio Integration

  • Install Whisper (speech-to-text)
  • Install TTS model (text-to-speech)
  • Integrate with Telegram voice messages
  • Audio RAG (transcription + search)

Phase 2: Larger LLMs

  • Install Mistral 7B (better reasoning)
  • Install Llama 2 70B (if enough VRAM via quantization)
  • Multi-model routing (task-specific models)

Phase 3: Advanced Vision

  • Image captioning (BLIP-2)
  • Zero-shot classification
  • Video understanding (frame extraction + CLIP)

Phase 4: Optimization

  • Model quantization (reduce VRAM)
  • Batch inference (increase throughput)
  • Model caching (Redis)
  • GPU sharing (multiple models concurrently)

Last Updated: 2025-01-17
Maintained by: Ivan Tytar & DAARION Team
Status: Production Ready (17 services, 3 AI models)