- Created sync-node2-dagi-agents.py script to sync agents from agents_city_mapping.yaml - Synced 50 DAGI agents across 10 districts: - Leadership Hall (4): Solarius, Sofia, PrimeSynth, Nexor - System Control (6): Monitor, Strategic Sentinels, Vindex, Helix, Aurora, Arbitron - Engineering Lab (5): ByteForge, Vector, ChainWeaver, Cypher, Canvas - Marketing Hub (6): Roxy, Mira, Tempo, Harmony, Faye, Storytelling - Finance Office (4): Financial Analyst, Accountant, Budget Planner, Tax Advisor - Web3 District (5): Smart Contract Dev, DeFi Analyst, Tokenomics Expert, NFT Specialist, DAO Governance - Security Bunker (7): Shadelock, Exor, Penetration Tester, Security Monitor, Incident Responder, Shadelock Forensics, Exor Forensics - Vision Studio (4): Iris, Lumen, Spectra, Video Analyzer - R&D Lab (6): ProtoMind, LabForge, TestPilot, ModelScout, BreakPoint, GrowCell - Memory Vault (3): Somnia, Memory Manager, Knowledge Indexer - Fixed Swapper config to use swapper_config_node2.yaml with 8 models - Created TASK_PHASE_NODE2_FULL_DAGI_INTEGRATION_v1.md NODE2 now shows: - 50 agents in DAGI Router Card - 8 models in Swapper Service (gpt-oss, phi3, starcoder2, mistral-nemo, gemma2, deepseek-coder, qwen2.5-coder, deepseek-r1) - Full isolation from NODE1
Swapper Service
Version: 1.0.0
Status: ✅ Ready for Node #2
Port: 8890
Dynamic model loading service that manages LLM models on-demand to optimize memory usage. Supports single-active mode (one model loaded at a time).
Overview
Swapper Service provides:
- Dynamic Model Loading — Load/unload models on-demand
- Single-Active Mode — Only one model loaded at a time (memory optimization)
- Model Metrics — Track uptime, request count, load/unload times
- Ollama Integration — Works with Ollama models
- REST API — Full API for model management
Features
Model Management
- Load models on-demand
- Unload models to free memory
- Track which model is currently active
- Monitor model uptime and usage
Metrics
- Current active model
- Model uptime (hours)
- Request count per model
- Load/unload timestamps
- Total uptime per model
Single-Active Mode
- Only one model loaded at a time
- Automatic unloading of previous model when loading new one
- Optimizes memory usage on resource-constrained systems
Quick Start
Docker (Recommended)
# Build and start
docker-compose up -d swapper-service
# Check health
curl http://localhost:8890/health
# Get status
curl http://localhost:8890/status
# List models
curl http://localhost:8890/models
Local Development
cd services/swapper-service
# Install dependencies
pip install -r requirements.txt
# Set environment variables
export OLLAMA_BASE_URL=http://localhost:11434
export SWAPPER_CONFIG_PATH=./config/swapper_config.yaml
# Run service
python -m app.main
API Endpoints
Health & Status
GET /health
Health check endpoint
Response:
{
"status": "healthy",
"service": "swapper-service",
"active_model": "deepseek-r1-70b",
"mode": "single-active"
}
GET /status
Get full Swapper service status
Response:
{
"status": "healthy",
"active_model": "deepseek-r1-70b",
"available_models": ["deepseek-r1-70b", "qwen2.5-coder-32b", ...],
"loaded_models": ["deepseek-r1-70b"],
"mode": "single-active",
"total_models": 8
}
Model Management
GET /models
List all available models
Response:
{
"models": [
{
"name": "deepseek-r1-70b",
"ollama_name": "deepseek-r1:70b",
"type": "llm",
"size_gb": 42,
"priority": "high",
"status": "loaded"
}
]
}
GET /models/{model_name}
Get information about a specific model
Response:
{
"name": "deepseek-r1-70b",
"ollama_name": "deepseek-r1:70b",
"type": "llm",
"size_gb": 42,
"priority": "high",
"status": "loaded",
"loaded_at": "2025-11-22T10:30:00",
"unloaded_at": null,
"total_uptime_seconds": 3600.5
}
POST /models/{model_name}/load
Load a model
Response:
{
"status": "success",
"model": "deepseek-r1-70b",
"message": "Model deepseek-r1-70b loaded"
}
POST /models/{model_name}/unload
Unload a model
Response:
{
"status": "success",
"model": "deepseek-r1-70b",
"message": "Model deepseek-r1-70b unloaded"
}
Metrics
GET /metrics
Get metrics for all models
Response:
{
"metrics": [
{
"model_name": "deepseek-r1-70b",
"status": "loaded",
"loaded_at": "2025-11-22T10:30:00",
"uptime_hours": 1.5,
"request_count": 42,
"total_uptime_seconds": 5400.0
}
]
}
GET /metrics/{model_name}
Get metrics for a specific model
Response:
{
"model_name": "deepseek-r1-70b",
"status": "loaded",
"loaded_at": "2025-11-22T10:30:00",
"uptime_hours": 1.5,
"request_count": 42,
"total_uptime_seconds": 5400.0
}
Configuration
Environment Variables
| Variable | Default | Description |
|---|---|---|
OLLAMA_BASE_URL |
http://localhost:11434 |
Ollama API URL |
SWAPPER_CONFIG_PATH |
./config/swapper_config.yaml |
Path to config file |
SWAPPER_MODE |
single-active |
Mode: single-active or multi-active |
MAX_CONCURRENT_MODELS |
1 |
Max concurrent models (for multi-active mode) |
MODEL_SWAP_TIMEOUT |
30 |
Timeout for model swap (seconds) |
Config File (swapper_config.yaml)
swapper:
mode: single-active
max_concurrent_models: 1
model_swap_timeout: 30
gpu_enabled: true
metal_acceleration: true
models:
deepseek-r1-70b:
path: ollama:deepseek-r1:70b
type: llm
size_gb: 42
priority: high
Integration with Router
Swapper Service integrates with DAGI Router through metadata:
router_request = {
"message": "Your request",
"mode": "chat",
"metadata": {
"use_llm": "specialist_vision_8b", # Swapper will load this model
"swapper_service": "http://swapper-service:8890"
}
}
Monitoring
Health Check
curl http://localhost:8890/health
Prometheus Metrics (Future)
swapper_active_model— Currently active modelswapper_model_uptime_seconds— Uptime per modelswapper_model_requests_total— Total requests per model
Troubleshooting
Model won't load
# Check Ollama is running
curl http://localhost:11434/api/tags
# Check model exists in Ollama
curl http://localhost:11434/api/tags | grep "model_name"
# Check Swapper logs
docker logs swapper-service
Service not responding
# Check if service is running
docker ps | grep swapper-service
# Check health
curl http://localhost:8890/health
# Check logs
docker logs -f swapper-service
Differences: Swapper Service vs vLLM
Swapper Service:
- Model loading/unloading manager
- Single-active mode (one model at a time)
- Memory optimization
- Works with Ollama
- Lightweight, simple API
vLLM:
- High-performance inference engine
- Continuous serving (models stay loaded)
- Optimized for throughput
- Direct GPU acceleration
- More complex, production-grade
Use Swapper when:
- Memory is limited
- Need to switch between models frequently
- Running on resource-constrained systems (like Node #2 MacBook)
Use vLLM when:
- Need maximum throughput
- Models stay loaded for long periods
- Have dedicated GPU resources
- Production serving at scale
Next Steps
-
Add to Node #2 Admin Console
- Display active model
- Show model metrics (uptime, requests)
- Allow manual model loading/unloading
-
Integration with Router
- Auto-load models based on request type
- Route requests to appropriate models
-
Metrics Dashboard
- Grafana dashboard for Swapper metrics
- Model usage analytics
Last Updated: 2025-11-22
Maintained by: Ivan Tytar & DAARION Team
Status: ✅ Ready for Node #2