536 lines
13 KiB
Markdown
536 lines
13 KiB
Markdown
# 🖥️ System Inventory — DAARION & MicroDAO
|
|
|
|
**Version:** 1.0.0
|
|
**Last Updated:** 2025-01-17
|
|
**Server:** GEX44 #2844465 (Hetzner)
|
|
|
|
---
|
|
|
|
## 🖥️ Hardware Specifications
|
|
|
|
### Production Server (144.76.224.179)
|
|
|
|
**Provider:** Hetzner Dedicated Server GEX44
|
|
**Server ID:** #2844465
|
|
|
|
#### GPU Configuration
|
|
|
|
**GPU Model:** NVIDIA RTX 4000 SFF Ada Generation
|
|
**VRAM:** 20 GB GDDR6
|
|
**Architecture:** Ada Lovelace
|
|
**CUDA Version:** 12.2
|
|
**Driver Version:** 535.274.02
|
|
|
|
**Current VRAM Usage:**
|
|
- Ollama (qwen3:8b): ~5.6 GB
|
|
- Vision Encoder (ViT-L/14): ~1.9 GB
|
|
- **Total:** ~7.5 GB / 20 GB (37.5% usage)
|
|
|
|
#### CPU & RAM (Typical GEX44)
|
|
- **CPU:** AMD Ryzen 9 5950X (16 cores, 32 threads) or similar
|
|
- **RAM:** 128 GB DDR4
|
|
- **Storage:** 2x NVMe SSD (RAID configuration)
|
|
|
|
---
|
|
|
|
## 🤖 Installed AI Models
|
|
|
|
### 1. LLM Models (Language Models)
|
|
|
|
#### Ollama (Local)
|
|
**Service:** Ollama
|
|
**Port:** 11434
|
|
**Status:** ✅ Active
|
|
|
|
**Installed Models:**
|
|
|
|
| Model | Size | Parameters | Context | VRAM Usage | Purpose |
|
|
|-------|------|-----------|---------|------------|---------|
|
|
| **qwen3:8b** | ~4.7 GB | 8B | 32K | ~6 GB | Primary LLM for Router, fast inference |
|
|
|
|
**API:**
|
|
```bash
|
|
# List models
|
|
curl http://localhost:11434/api/tags
|
|
|
|
# Generate
|
|
curl http://localhost:11434/api/generate -d '{
|
|
"model": "qwen3:8b",
|
|
"prompt": "Hello"
|
|
}'
|
|
```
|
|
|
|
**Configuration:**
|
|
- Base URL: `http://172.17.0.1:11434` (from Docker containers)
|
|
- Used by: DAGI Router, DevTools, CrewAI, Gateway
|
|
|
|
---
|
|
|
|
### 2. Vision Models (Multimodal)
|
|
|
|
#### OpenCLIP (Vision Encoder Service)
|
|
**Service:** vision-encoder
|
|
**Port:** 8001
|
|
**Status:** ✅ Active (GPU-accelerated)
|
|
|
|
**Model Details:**
|
|
|
|
| Model | Architecture | Parameters | Embedding Dim | VRAM Usage | Purpose |
|
|
|-------|-------------|-----------|---------------|------------|---------|
|
|
| **ViT-L/14** | Vision Transformer Large | ~428M | 768 | ~4 GB | Text/Image embeddings for RAG |
|
|
| **OpenAI CLIP** | CLIP (Contrastive Language-Image Pre-training) | - | 768 | - | Pretrained weights |
|
|
|
|
**Capabilities:**
|
|
- ✅ Text → 768-dim embedding (0.1-0.5s on GPU, ~10-15s on CPU)
|
|
- ✅ Image → 768-dim embedding (0.3-1s on GPU, ~15-20s on CPU)
|
|
- ✅ Text-to-image search (via Qdrant)
|
|
- ✅ Image-to-image similarity search (via Qdrant)
|
|
- ✅ GPU acceleration: **~20-30x speedup** vs CPU
|
|
- ⏳ Zero-shot image classification (planned)
|
|
- ⏳ CLIP score calculation (planned)
|
|
|
|
**API Endpoints:**
|
|
```bash
|
|
# Text embedding
|
|
POST http://localhost:8001/embed/text
|
|
|
|
# Image embedding (URL)
|
|
POST http://localhost:8001/embed/image
|
|
|
|
# Image embedding (file upload)
|
|
POST http://localhost:8001/embed/image/upload
|
|
|
|
# Health check
|
|
GET http://localhost:8001/health
|
|
|
|
# Model info
|
|
GET http://localhost:8001/info
|
|
```
|
|
|
|
**Configuration:**
|
|
- Model: `ViT-L-14`
|
|
- Pretrained: `openai`
|
|
- Device: `cuda` (GPU)
|
|
- Normalize: `true`
|
|
- Integration: DAGI Router (mode: `vision_embed`, `image_search`)
|
|
|
|
---
|
|
|
|
### 3. Embedding Models (Text)
|
|
|
|
#### BAAI/bge-m3 (RAG Service)
|
|
**Service:** rag-service
|
|
**Port:** 9500
|
|
**Status:** ✅ Active
|
|
|
|
**Model Details:**
|
|
|
|
| Model | Type | Embedding Dim | Context Length | Device | Purpose |
|
|
|-------|------|---------------|----------------|--------|---------|
|
|
| **BAAI/bge-m3** | Dense Retrieval | 1024 | 8192 | CPU/GPU | Text embeddings for RAG |
|
|
|
|
**Capabilities:**
|
|
- ✅ Document embedding for retrieval
|
|
- ✅ Query embedding
|
|
- ✅ Multi-lingual support
|
|
- ✅ Long context (8192 tokens)
|
|
|
|
**Storage:**
|
|
- Vector database: PostgreSQL with pgvector extension
|
|
- Indexed documents: Chat messages, tasks, meetings, governance docs
|
|
|
|
**Configuration:**
|
|
- Model: `BAAI/bge-m3`
|
|
- Device: `cpu` (can use GPU if available)
|
|
- HuggingFace cache: `/root/.cache/huggingface`
|
|
|
|
---
|
|
|
|
### 4. Audio Models
|
|
|
|
**Status:** ❌ Not installed yet
|
|
|
|
**Planned:**
|
|
- Whisper (speech-to-text)
|
|
- TTS models (text-to-speech)
|
|
- Audio classification
|
|
|
|
---
|
|
|
|
## 🗄️ Vector Databases
|
|
|
|
### 1. Qdrant (Image Embeddings)
|
|
**Service:** qdrant
|
|
**Port:** 6333 (HTTP), 6334 (gRPC)
|
|
**Status:** ✅ Active
|
|
|
|
**Collections:**
|
|
|
|
| Collection | Vectors | Dimension | Distance | Purpose |
|
|
|-----------|---------|-----------|----------|---------|
|
|
| **daarion_images** | Variable | 768 | Cosine | Image search (text→image, image→image) |
|
|
|
|
**Storage:** Docker volume `qdrant-data`
|
|
|
|
**API:**
|
|
```bash
|
|
# Health check
|
|
curl http://localhost:6333/healthz
|
|
|
|
# List collections
|
|
curl http://localhost:6333/collections
|
|
|
|
# Collection info
|
|
curl http://localhost:6333/collections/daarion_images
|
|
```
|
|
|
|
---
|
|
|
|
### 2. PostgreSQL + pgvector (Text Embeddings)
|
|
**Service:** dagi-postgres
|
|
**Port:** 5432
|
|
**Status:** ✅ Active
|
|
|
|
**Databases:**
|
|
|
|
| Database | Extension | Purpose |
|
|
|----------|-----------|---------|
|
|
| **daarion_memory** | - | Agent memory, context |
|
|
| **daarion_city** | pgvector | RAG document storage (1024-dim) |
|
|
|
|
**Storage:** Docker volume `postgres-data`
|
|
|
|
---
|
|
|
|
### 3. Neo4j (Graph Memory)
|
|
**Service:** neo4j
|
|
**Port:** 7687 (Bolt), 7474 (HTTP)
|
|
**Status:** ✅ Active (optional)
|
|
|
|
**Purpose:**
|
|
- Knowledge graph for entities
|
|
- Agent relationships
|
|
- DAO structure mapping
|
|
|
|
**Storage:** Docker volume (if configured)
|
|
|
|
---
|
|
|
|
## 🛠️ AI Services
|
|
|
|
### 1. DAGI Router (9102)
|
|
**Purpose:** Main routing engine for AI requests
|
|
**LLM Integration:**
|
|
- Ollama (qwen3:8b)
|
|
- DeepSeek (optional, API key required)
|
|
- OpenAI (optional, API key required)
|
|
|
|
**Providers:**
|
|
- LLM Provider (Ollama, DeepSeek, OpenAI)
|
|
- Vision Encoder Provider (OpenCLIP)
|
|
- DevTools Provider
|
|
- CrewAI Provider
|
|
- Vision RAG Provider (image search)
|
|
|
|
---
|
|
|
|
### 2. RAG Service (9500)
|
|
**Purpose:** Document retrieval and Q&A
|
|
**Models:**
|
|
- Embeddings: BAAI/bge-m3 (1024-dim)
|
|
- LLM: via DAGI Router (qwen3:8b)
|
|
|
|
**Capabilities:**
|
|
- Document ingestion (chat, tasks, meetings, governance, RWA, oracle)
|
|
- Vector search (pgvector)
|
|
- Q&A generation
|
|
- Context ranking
|
|
|
|
---
|
|
|
|
### 3. Vision Encoder (8001)
|
|
**Purpose:** Text/Image embeddings for multimodal RAG
|
|
**Models:**
|
|
- OpenCLIP ViT-L/14 (768-dim)
|
|
|
|
**Capabilities:**
|
|
- Text embeddings
|
|
- Image embeddings
|
|
- Image search (text-to-image, image-to-image)
|
|
|
|
---
|
|
|
|
### 4. Parser Service (9400)
|
|
**Purpose:** Document parsing and processing
|
|
**Capabilities:**
|
|
- PDF parsing
|
|
- Image extraction
|
|
- OCR (via Crawl4AI)
|
|
- Q&A generation
|
|
|
|
**Integration:**
|
|
- Crawl4AI for web content
|
|
- Vision Encoder for image analysis (planned)
|
|
|
|
---
|
|
|
|
### 5. Memory Service (8000)
|
|
**Purpose:** Agent memory and context management
|
|
**Storage:**
|
|
- PostgreSQL (daarion_memory)
|
|
- Redis (short-term cache, optional)
|
|
- Neo4j (graph memory, optional)
|
|
|
|
---
|
|
|
|
### 6. CrewAI Orchestrator (9010)
|
|
**Purpose:** Multi-agent workflow execution
|
|
**LLM:** via DAGI Router (qwen3:8b)
|
|
|
|
**Workflows:**
|
|
- microDAO onboarding
|
|
- Code review
|
|
- Proposal review
|
|
- Task decomposition
|
|
|
|
---
|
|
|
|
### 7. DevTools Backend (8008)
|
|
**Purpose:** Development tool execution
|
|
**Tools:**
|
|
- File operations (read/write)
|
|
- Test execution
|
|
- Notebook execution
|
|
- Git operations (planned)
|
|
|
|
---
|
|
|
|
### 8. Bot Gateway (9300)
|
|
**Purpose:** Telegram/Discord bot integration
|
|
**Bots:**
|
|
- DAARWIZZ (Telegram)
|
|
- Helion (Telegram, Energy Union)
|
|
|
|
---
|
|
|
|
### 9. RBAC Service (9200)
|
|
**Purpose:** Role-based access control
|
|
**Storage:** SQLite (`rbac.db`)
|
|
|
|
---
|
|
|
|
## 📊 GPU Memory Allocation (Estimated)
|
|
|
|
**Total VRAM:** 24 GB
|
|
|
|
| Service | Model | VRAM Usage | Status |
|
|
|---------|-------|-----------|--------|
|
|
| **Vision Encoder** | OpenCLIP ViT-L/14 | ~4 GB | Always loaded |
|
|
| **Ollama** | qwen3:8b | ~6 GB | Loaded on demand |
|
|
| **Available** | - | ~14 GB | For other models |
|
|
|
|
**Note:**
|
|
- Ollama and Vision Encoder can run simultaneously (~10 GB total)
|
|
- Remaining 14 GB available for additional models (audio, larger LLMs, etc.)
|
|
|
|
---
|
|
|
|
## 🔄 Model Loading Strategy
|
|
|
|
### Vision Encoder (Always-On)
|
|
- **Preloaded:** Yes (on service startup)
|
|
- **Reason:** Fast inference for image search
|
|
- **Unload:** Never (unless service restart)
|
|
|
|
### Ollama qwen3:8b (On-Demand)
|
|
- **Preloaded:** No
|
|
- **Load Time:** 2-3 seconds (first request)
|
|
- **Keep Alive:** 5 minutes (default)
|
|
- **Unload:** After idle timeout
|
|
|
|
### Future Models (Planned)
|
|
- **Whisper:** Load on-demand for audio transcription
|
|
- **TTS:** Load on-demand for speech synthesis
|
|
- **Larger LLMs:** Load on-demand (if VRAM available)
|
|
|
|
---
|
|
|
|
## 📈 Performance Benchmarks
|
|
|
|
### LLM Inference (qwen3:8b)
|
|
- **Tokens/sec:** ~50-80 tokens/sec (GPU)
|
|
- **Latency:** 100-200ms (first token)
|
|
- **Context:** 32K tokens
|
|
- **Batch size:** 1 (default)
|
|
|
|
### Vision Inference (ViT-L/14)
|
|
- **Text embedding:** 10-20ms (GPU)
|
|
- **Image embedding:** 30-50ms (GPU)
|
|
- **Throughput:** 50-100 images/sec (batch)
|
|
|
|
### RAG Search (BAAI/bge-m3)
|
|
- **Query embedding:** 50-100ms (CPU)
|
|
- **Vector search:** 5-10ms (pgvector)
|
|
- **Total latency:** 60-120ms
|
|
|
|
---
|
|
|
|
## 🔧 Model Management
|
|
|
|
### Ollama Models
|
|
|
|
**List installed models:**
|
|
```bash
|
|
curl http://localhost:11434/api/tags
|
|
```
|
|
|
|
**Pull new model:**
|
|
```bash
|
|
ollama pull llama2:7b
|
|
ollama pull mistral:7b
|
|
```
|
|
|
|
**Remove model:**
|
|
```bash
|
|
ollama rm qwen3:8b
|
|
```
|
|
|
|
**Check model info:**
|
|
```bash
|
|
ollama show qwen3:8b
|
|
```
|
|
|
|
---
|
|
|
|
### Vision Encoder Models
|
|
|
|
**Change model (in docker-compose.yml):**
|
|
```yaml
|
|
environment:
|
|
- MODEL_NAME=ViT-B-32 # Smaller, faster
|
|
- MODEL_PRETRAINED=openai
|
|
```
|
|
|
|
**Available models:**
|
|
- `ViT-B-32` (512-dim, 2 GB VRAM)
|
|
- `ViT-L-14` (768-dim, 4 GB VRAM) ← Current
|
|
- `ViT-L-14@336` (768-dim, 6 GB VRAM, higher resolution)
|
|
- `ViT-H-14` (1024-dim, 8 GB VRAM, highest quality)
|
|
|
|
---
|
|
|
|
## 📋 Complete Service List (17 Services)
|
|
|
|
| # | Service | Port | GPU | Models/Tools | Status |
|
|
|---|---------|------|-----|-------------|--------|
|
|
| 1 | DAGI Router | 9102 | ❌ | Routing engine | ✅ |
|
|
| 2 | Bot Gateway | 9300 | ❌ | Telegram bots | ✅ |
|
|
| 3 | DevTools | 8008 | ❌ | File ops, tests | ✅ |
|
|
| 4 | CrewAI | 9010 | ❌ | Multi-agent | ✅ |
|
|
| 5 | RBAC | 9200 | ❌ | Access control | ✅ |
|
|
| 6 | RAG Service | 9500 | ❌ | BAAI/bge-m3 | ✅ |
|
|
| 7 | Memory Service | 8000 | ❌ | Context mgmt | ✅ |
|
|
| 8 | Parser Service | 9400 | ❌ | PDF, OCR | ✅ |
|
|
| 9 | **Vision Encoder** | 8001 | ✅ | **OpenCLIP ViT-L/14** | ✅ |
|
|
| 10 | PostgreSQL | 5432 | ❌ | pgvector | ✅ |
|
|
| 11 | Redis | 6379 | ❌ | Cache | ✅ |
|
|
| 12 | Neo4j | 7687 | ❌ | Graph DB | ✅ |
|
|
| 13 | **Qdrant** | 6333 | ❌ | Vector DB | ✅ |
|
|
| 14 | Grafana | 3000 | ❌ | Dashboards | ✅ |
|
|
| 15 | Prometheus | 9090 | ❌ | Metrics | ✅ |
|
|
| 16 | Neo4j Exporter | 9091 | ❌ | Metrics | ✅ |
|
|
| 17 | **Ollama** | 11434 | ✅ | **qwen3:8b** | ✅ |
|
|
|
|
**GPU Services:** 2 (Vision Encoder, Ollama)
|
|
**Total VRAM Usage:** ~10 GB (concurrent)
|
|
|
|
---
|
|
|
|
## 🚀 Deployment Checklist
|
|
|
|
### Pre-Deployment (Local)
|
|
- [x] Code reviewed and tested
|
|
- [x] Documentation updated (WARP.md, INFRASTRUCTURE.md)
|
|
- [x] Jupyter Notebook updated
|
|
- [x] All tests passing
|
|
- [x] Git committed and pushed
|
|
|
|
### Deployment (Server)
|
|
```bash
|
|
# 1. SSH to server
|
|
ssh root@144.76.224.179
|
|
|
|
# 2. Pull latest code
|
|
cd /opt/microdao-daarion
|
|
git pull origin main
|
|
|
|
# 3. Check GPU
|
|
nvidia-smi
|
|
|
|
# 4. Build new services
|
|
docker-compose build vision-encoder
|
|
|
|
# 5. Start all services
|
|
docker-compose up -d
|
|
|
|
# 6. Verify health
|
|
docker-compose ps
|
|
curl http://localhost:8001/health # Vision Encoder
|
|
curl http://localhost:6333/healthz # Qdrant
|
|
curl http://localhost:9102/health # Router
|
|
|
|
# 7. Run smoke tests
|
|
./smoke.sh
|
|
./test-vision-encoder.sh
|
|
|
|
# 8. Check logs
|
|
docker-compose logs -f vision-encoder
|
|
docker-compose logs -f router
|
|
|
|
# 9. Monitor GPU
|
|
watch -n 1 nvidia-smi
|
|
```
|
|
|
|
---
|
|
|
|
## 📖 Documentation Index
|
|
|
|
- **[WARP.md](./WARP.md)** — Developer guide (quick start for Warp AI)
|
|
- **[INFRASTRUCTURE.md](./INFRASTRUCTURE.md)** — Server, services, deployment
|
|
- **[VISION-ENCODER-STATUS.md](./VISION-ENCODER-STATUS.md)** — Vision Encoder status
|
|
- **[VISION-RAG-IMPLEMENTATION.md](./VISION-RAG-IMPLEMENTATION.md)** — Vision RAG complete implementation
|
|
- **[docs/cursor/vision_encoder_deployment_task.md](./docs/cursor/vision_encoder_deployment_task.md)** — Deployment task
|
|
- **[docs/infrastructure_quick_ref.ipynb](./docs/infrastructure_quick_ref.ipynb)** — Jupyter quick reference
|
|
|
|
---
|
|
|
|
## 🎯 Next Steps
|
|
|
|
### Phase 1: Audio Integration
|
|
- [ ] Install Whisper (speech-to-text)
|
|
- [ ] Install TTS model (text-to-speech)
|
|
- [ ] Integrate with Telegram voice messages
|
|
- [ ] Audio RAG (transcription + search)
|
|
|
|
### Phase 2: Larger LLMs
|
|
- [ ] Install Mistral 7B (better reasoning)
|
|
- [ ] Install Llama 2 70B (if enough VRAM via quantization)
|
|
- [ ] Multi-model routing (task-specific models)
|
|
|
|
### Phase 3: Advanced Vision
|
|
- [ ] Image captioning (BLIP-2)
|
|
- [ ] Zero-shot classification
|
|
- [ ] Video understanding (frame extraction + CLIP)
|
|
|
|
### Phase 4: Optimization
|
|
- [ ] Model quantization (reduce VRAM)
|
|
- [ ] Batch inference (increase throughput)
|
|
- [ ] Model caching (Redis)
|
|
- [ ] GPU sharing (multiple models concurrently)
|
|
|
|
---
|
|
|
|
**Last Updated:** 2025-01-17
|
|
**Maintained by:** Ivan Tytar & DAARION Team
|
|
**Status:** ✅ Production Ready (17 services, 3 AI models)
|