feat: Add presence heartbeat for Matrix online status

- matrix-gateway: POST /internal/matrix/presence/online endpoint
- usePresenceHeartbeat hook with activity tracking
- Auto away after 5 min inactivity
- Offline on page close/visibility change
- Integrated in MatrixChatRoom component
This commit is contained in:
Apple
2025-11-27 00:19:40 -08:00
parent 5bed515852
commit 3de3c8cb36
6371 changed files with 1317450 additions and 932 deletions

View File

@@ -0,0 +1,13 @@
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY app/ ./app/
EXPOSE 8890
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8890"]

View File

@@ -0,0 +1,353 @@
# Swapper Service
**Version:** 1.0.0
**Status:** ✅ Ready for Node #2
**Port:** 8890
Dynamic model loading service that manages LLM models on-demand to optimize memory usage. Supports single-active mode (one model loaded at a time).
---
## Overview
Swapper Service provides:
- **Dynamic Model Loading** — Load/unload models on-demand
- **Single-Active Mode** — Only one model loaded at a time (memory optimization)
- **Model Metrics** — Track uptime, request count, load/unload times
- **Ollama Integration** — Works with Ollama models
- **REST API** — Full API for model management
---
## Features
### Model Management
- Load models on-demand
- Unload models to free memory
- Track which model is currently active
- Monitor model uptime and usage
### Metrics
- Current active model
- Model uptime (hours)
- Request count per model
- Load/unload timestamps
- Total uptime per model
### Single-Active Mode
- Only one model loaded at a time
- Automatic unloading of previous model when loading new one
- Optimizes memory usage on resource-constrained systems
---
## Quick Start
### Docker (Recommended)
```bash
# Build and start
docker-compose up -d swapper-service
# Check health
curl http://localhost:8890/health
# Get status
curl http://localhost:8890/status
# List models
curl http://localhost:8890/models
```
### Local Development
```bash
cd services/swapper-service
# Install dependencies
pip install -r requirements.txt
# Set environment variables
export OLLAMA_BASE_URL=http://localhost:11434
export SWAPPER_CONFIG_PATH=./config/swapper_config.yaml
# Run service
python -m app.main
```
---
## API Endpoints
### Health & Status
#### GET /health
Health check endpoint
**Response:**
```json
{
"status": "healthy",
"service": "swapper-service",
"active_model": "deepseek-r1-70b",
"mode": "single-active"
}
```
#### GET /status
Get full Swapper service status
**Response:**
```json
{
"status": "healthy",
"active_model": "deepseek-r1-70b",
"available_models": ["deepseek-r1-70b", "qwen2.5-coder-32b", ...],
"loaded_models": ["deepseek-r1-70b"],
"mode": "single-active",
"total_models": 8
}
```
### Model Management
#### GET /models
List all available models
**Response:**
```json
{
"models": [
{
"name": "deepseek-r1-70b",
"ollama_name": "deepseek-r1:70b",
"type": "llm",
"size_gb": 42,
"priority": "high",
"status": "loaded"
}
]
}
```
#### GET /models/{model_name}
Get information about a specific model
**Response:**
```json
{
"name": "deepseek-r1-70b",
"ollama_name": "deepseek-r1:70b",
"type": "llm",
"size_gb": 42,
"priority": "high",
"status": "loaded",
"loaded_at": "2025-11-22T10:30:00",
"unloaded_at": null,
"total_uptime_seconds": 3600.5
}
```
#### POST /models/{model_name}/load
Load a model
**Response:**
```json
{
"status": "success",
"model": "deepseek-r1-70b",
"message": "Model deepseek-r1-70b loaded"
}
```
#### POST /models/{model_name}/unload
Unload a model
**Response:**
```json
{
"status": "success",
"model": "deepseek-r1-70b",
"message": "Model deepseek-r1-70b unloaded"
}
```
### Metrics
#### GET /metrics
Get metrics for all models
**Response:**
```json
{
"metrics": [
{
"model_name": "deepseek-r1-70b",
"status": "loaded",
"loaded_at": "2025-11-22T10:30:00",
"uptime_hours": 1.5,
"request_count": 42,
"total_uptime_seconds": 5400.0
}
]
}
```
#### GET /metrics/{model_name}
Get metrics for a specific model
**Response:**
```json
{
"model_name": "deepseek-r1-70b",
"status": "loaded",
"loaded_at": "2025-11-22T10:30:00",
"uptime_hours": 1.5,
"request_count": 42,
"total_uptime_seconds": 5400.0
}
```
---
## Configuration
### Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `OLLAMA_BASE_URL` | `http://localhost:11434` | Ollama API URL |
| `SWAPPER_CONFIG_PATH` | `./config/swapper_config.yaml` | Path to config file |
| `SWAPPER_MODE` | `single-active` | Mode: `single-active` or `multi-active` |
| `MAX_CONCURRENT_MODELS` | `1` | Max concurrent models (for multi-active mode) |
| `MODEL_SWAP_TIMEOUT` | `30` | Timeout for model swap (seconds) |
### Config File (swapper_config.yaml)
```yaml
swapper:
mode: single-active
max_concurrent_models: 1
model_swap_timeout: 30
gpu_enabled: true
metal_acceleration: true
models:
deepseek-r1-70b:
path: ollama:deepseek-r1:70b
type: llm
size_gb: 42
priority: high
```
---
## Integration with Router
Swapper Service integrates with DAGI Router through metadata:
```python
router_request = {
"message": "Your request",
"mode": "chat",
"metadata": {
"use_llm": "specialist_vision_8b", # Swapper will load this model
"swapper_service": "http://swapper-service:8890"
}
}
```
---
## Monitoring
### Health Check
```bash
curl http://localhost:8890/health
```
### Prometheus Metrics (Future)
- `swapper_active_model` — Currently active model
- `swapper_model_uptime_seconds` — Uptime per model
- `swapper_model_requests_total` — Total requests per model
---
## Troubleshooting
### Model won't load
```bash
# Check Ollama is running
curl http://localhost:11434/api/tags
# Check model exists in Ollama
curl http://localhost:11434/api/tags | grep "model_name"
# Check Swapper logs
docker logs swapper-service
```
### Service not responding
```bash
# Check if service is running
docker ps | grep swapper-service
# Check health
curl http://localhost:8890/health
# Check logs
docker logs -f swapper-service
```
---
## Differences: Swapper Service vs vLLM
**Swapper Service:**
- Model loading/unloading manager
- Single-active mode (one model at a time)
- Memory optimization
- Works with Ollama
- Lightweight, simple API
**vLLM:**
- High-performance inference engine
- Continuous serving (models stay loaded)
- Optimized for throughput
- Direct GPU acceleration
- More complex, production-grade
**Use Swapper when:**
- Memory is limited
- Need to switch between models frequently
- Running on resource-constrained systems (like Node #2 MacBook)
**Use vLLM when:**
- Need maximum throughput
- Models stay loaded for long periods
- Have dedicated GPU resources
- Production serving at scale
---
## Next Steps
1. **Add to Node #2 Admin Console**
- Display active model
- Show model metrics (uptime, requests)
- Allow manual model loading/unloading
2. **Integration with Router**
- Auto-load models based on request type
- Route requests to appropriate models
3. **Metrics Dashboard**
- Grafana dashboard for Swapper metrics
- Model usage analytics
---
**Last Updated:** 2025-11-22
**Maintained by:** Ivan Tytar & DAARION Team
**Status:** ✅ Ready for Node #2

View File

@@ -0,0 +1,2 @@
# Swapper Service App Package

View File

@@ -0,0 +1,168 @@
"""
Cabinet API endpoints for Swapper Service
Provides data for Node #1 and Node #2 admin consoles
"""
from fastapi import APIRouter, HTTPException
from typing import Dict, Any, List
from datetime import datetime
# Import will be done after swapper is initialized
router = APIRouter(prefix="/api/cabinet", tags=["cabinet"])
def get_swapper():
"""Get swapper instance (lazy import to avoid circular dependency)"""
from app.main import swapper
return swapper
@router.get("/swapper/status")
async def get_swapper_status_for_cabinet() -> Dict[str, Any]:
"""
Get Swapper Service status for admin console display
Returns data formatted for Node #1 and Node #2 cabinets
"""
try:
swapper = get_swapper()
status = await swapper.get_status()
metrics = await swapper.get_model_metrics()
# Format active model info
active_model_info = None
if status.active_model:
active_metrics = next(
(m for m in metrics if m.model_name == status.active_model),
None
)
if active_metrics:
active_model_info = {
"name": status.active_model,
"uptime_hours": round(active_metrics.uptime_hours, 2),
"request_count": active_metrics.request_count,
"loaded_at": active_metrics.loaded_at.isoformat() if active_metrics.loaded_at else None
}
# Format all models with their status
swapper = get_swapper()
models_info = []
for model_name in status.available_models:
model_metrics = next(
(m for m in metrics if m.model_name == model_name),
None
)
model_data = swapper.models.get(model_name)
if model_data:
models_info.append({
"name": model_name,
"ollama_name": model_data.ollama_name,
"type": model_data.type,
"size_gb": model_data.size_gb,
"priority": model_data.priority,
"status": model_data.status.value,
"is_active": model_name == status.active_model,
"uptime_hours": round(model_metrics.uptime_hours, 2) if model_metrics else 0.0,
"request_count": model_metrics.request_count if model_metrics else 0,
"total_uptime_seconds": model_metrics.total_uptime_seconds if model_metrics else 0.0
})
return {
"service": "swapper-service",
"status": status.status,
"mode": status.mode,
"active_model": active_model_info,
"total_models": status.total_models,
"available_models": status.available_models,
"loaded_models": status.loaded_models,
"models": models_info,
"timestamp": datetime.now().isoformat()
}
except Exception as e:
raise HTTPException(status_code=500, detail=f"Error getting Swapper status: {str(e)}")
@router.get("/swapper/models")
async def get_swapper_models_for_cabinet() -> Dict[str, Any]:
"""
Get all models with detailed information for cabinet display
"""
try:
swapper = get_swapper()
status = await swapper.get_status()
metrics = await swapper.get_model_metrics()
models_detail = []
for model_name in status.available_models:
model_data = swapper.models.get(model_name)
model_metrics = next(
(m for m in metrics if m.model_name == model_name),
None
)
if model_data:
models_detail.append({
"name": model_name,
"ollama_name": model_data.ollama_name,
"type": model_data.type,
"size_gb": model_data.size_gb,
"priority": model_data.priority,
"status": model_data.status.value,
"is_active": model_name == status.active_model,
"can_load": model_data.status.value in ["unloaded", "error"],
"can_unload": model_data.status.value == "loaded",
"uptime_hours": round(model_metrics.uptime_hours, 2) if model_metrics else 0.0,
"request_count": model_metrics.request_count if model_metrics else 0,
"total_uptime_seconds": model_metrics.total_uptime_seconds if model_metrics else 0.0,
"loaded_at": model_metrics.loaded_at.isoformat() if model_metrics and model_metrics.loaded_at else None
})
return {
"models": models_detail,
"total": len(models_detail),
"active_count": len(status.loaded_models),
"timestamp": datetime.now().isoformat()
}
except Exception as e:
raise HTTPException(status_code=500, detail=f"Error getting models: {str(e)}")
@router.get("/swapper/metrics/summary")
async def get_swapper_metrics_summary() -> Dict[str, Any]:
"""
Get summary metrics for cabinet dashboard
"""
try:
swapper = get_swapper()
status = await swapper.get_status()
metrics = await swapper.get_model_metrics()
# Calculate totals
total_uptime_hours = sum(m.uptime_hours for m in metrics)
total_requests = sum(m.request_count for m in metrics)
# Most used model
most_used = max(metrics, key=lambda m: m.total_uptime_seconds) if metrics else None
return {
"summary": {
"total_models": status.total_models,
"active_models": len(status.loaded_models),
"available_models": len(status.available_models),
"total_uptime_hours": round(total_uptime_hours, 2),
"total_requests": total_requests
},
"most_used_model": {
"name": most_used.model_name,
"uptime_hours": round(most_used.uptime_hours, 2),
"request_count": most_used.request_count
} if most_used else None,
"active_model": {
"name": status.active_model,
"uptime_hours": round(
next((m.uptime_hours for m in metrics if m.model_name == status.active_model), 0.0),
2
) if status.active_model else None
} if status.active_model else None,
"timestamp": datetime.now().isoformat()
}
except Exception as e:
raise HTTPException(status_code=500, detail=f"Error getting metrics summary: {str(e)}")

View File

@@ -0,0 +1,437 @@
"""
Swapper Service - Dynamic Model Loading Service
Manages loading/unloading LLM models on-demand to optimize memory usage.
Supports single-active model mode (one model loaded at a time).
"""
import os
import asyncio
import logging
from typing import Optional, Dict, List, Any
from datetime import datetime, timedelta
from enum import Enum
from fastapi import FastAPI, HTTPException, BackgroundTasks
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
import httpx
import yaml
logger = logging.getLogger(__name__)
# ========== Configuration ==========
OLLAMA_BASE_URL = os.getenv("OLLAMA_BASE_URL", "http://localhost:11434")
SWAPPER_CONFIG_PATH = os.getenv("SWAPPER_CONFIG_PATH", "./config/swapper_config.yaml")
SWAPPER_MODE = os.getenv("SWAPPER_MODE", "single-active") # single-active or multi-active
MAX_CONCURRENT_MODELS = int(os.getenv("MAX_CONCURRENT_MODELS", "1"))
MODEL_SWAP_TIMEOUT = int(os.getenv("MODEL_SWAP_TIMEOUT", "30"))
# ========== Models ==========
class ModelStatus(str, Enum):
"""Model status"""
LOADED = "loaded"
LOADING = "loading"
UNLOADED = "unloaded"
UNLOADING = "unloading"
ERROR = "error"
class ModelInfo(BaseModel):
"""Model information"""
name: str
ollama_name: str
type: str # llm, code, vision, math
size_gb: float
priority: str # high, medium, low
status: ModelStatus
loaded_at: Optional[datetime] = None
unloaded_at: Optional[datetime] = None
total_uptime_seconds: float = 0.0
request_count: int = 0
class SwapperStatus(BaseModel):
"""Swapper service status"""
status: str
active_model: Optional[str] = None
available_models: List[str]
loaded_models: List[str]
mode: str
total_models: int
class ModelMetrics(BaseModel):
"""Model usage metrics"""
model_name: str
status: str
loaded_at: Optional[datetime] = None
uptime_hours: float
request_count: int
total_uptime_seconds: float
# ========== Swapper Service ==========
class SwapperService:
"""Swapper Service - manages model loading/unloading"""
def __init__(self):
self.models: Dict[str, ModelInfo] = {}
self.active_model: Optional[str] = None
self.loading_lock = asyncio.Lock()
self.http_client = httpx.AsyncClient(timeout=300.0)
self.model_uptime: Dict[str, float] = {} # Track uptime per model
self.model_load_times: Dict[str, datetime] = {} # Track when model was loaded
async def initialize(self):
"""Initialize Swapper Service - load configuration"""
config = None
try:
logger.info(f"🔧 Initializing Swapper Service...")
logger.info(f"🔧 Config path: {SWAPPER_CONFIG_PATH}")
logger.info(f"🔧 Config exists: {os.path.exists(SWAPPER_CONFIG_PATH)}")
if os.path.exists(SWAPPER_CONFIG_PATH):
with open(SWAPPER_CONFIG_PATH, 'r') as f:
config = yaml.safe_load(f)
models_config = config.get('models', {})
logger.info(f"🔧 Found {len(models_config)} models in config")
for model_key, model_config in models_config.items():
ollama_name = model_config.get('path', '').replace('ollama:', '')
logger.info(f"🔧 Adding model: {model_key} -> {ollama_name}")
self.models[model_key] = ModelInfo(
name=model_key,
ollama_name=ollama_name,
type=model_config.get('type', 'llm'),
size_gb=model_config.get('size_gb', 0),
priority=model_config.get('priority', 'medium'),
status=ModelStatus.UNLOADED
)
self.model_uptime[model_key] = 0.0
logger.info(f"✅ Loaded {len(self.models)} models into Swapper")
else:
logger.warning(f"⚠️ Config file not found: {SWAPPER_CONFIG_PATH}, using defaults")
# Load default models from Ollama
await self._load_models_from_ollama()
logger.info(f"✅ Swapper Service initialized with {len(self.models)} models")
logger.info(f"✅ Model names: {list(self.models.keys())}")
# Завантажити модель за замовчанням, якщо вказано в конфігурації
if config:
swapper_config = config.get('swapper', {})
default_model = swapper_config.get('default_model')
if default_model and default_model in self.models:
logger.info(f"🔄 Loading default model: {default_model}")
success = await self.load_model(default_model)
if success:
logger.info(f"✅ Default model loaded: {default_model}")
else:
logger.warning(f"⚠️ Failed to load default model: {default_model}")
elif default_model:
logger.warning(f"⚠️ Default model '{default_model}' not found in models list")
except Exception as e:
logger.error(f"❌ Error initializing Swapper Service: {e}", exc_info=True)
import traceback
logger.error(f"❌ Traceback: {traceback.format_exc()}")
async def _load_models_from_ollama(self):
"""Load available models from Ollama"""
try:
response = await self.http_client.get(f"{OLLAMA_BASE_URL}/api/tags")
if response.status_code == 200:
data = response.json()
for model in data.get('models', []):
model_name = model.get('name', '')
# Extract base name (remove :latest, :7b, etc.)
base_name = model_name.split(':')[0]
if base_name not in self.models:
size_gb = model.get('size', 0) / (1024**3) # Convert bytes to GB
self.models[base_name] = ModelInfo(
name=base_name,
ollama_name=model_name,
type='llm', # Default type
size_gb=size_gb,
priority='medium',
status=ModelStatus.UNLOADED
)
self.model_uptime[base_name] = 0.0
logger.info(f"✅ Loaded {len(self.models)} models from Ollama")
except Exception as e:
logger.error(f"❌ Error loading models from Ollama: {e}")
async def load_model(self, model_name: str) -> bool:
"""Load a model (unload current if in single-active mode)"""
async with self.loading_lock:
try:
# Check if model exists
if model_name not in self.models:
logger.error(f"❌ Model not found: {model_name}")
return False
model_info = self.models[model_name]
# If single-active mode and another model is loaded, unload it first
if SWAPPER_MODE == "single-active" and self.active_model and self.active_model != model_name:
await self._unload_model_internal(self.active_model)
# Load the model
logger.info(f"🔄 Loading model: {model_name}")
model_info.status = ModelStatus.LOADING
# Check if model is already loaded in Ollama
response = await self.http_client.post(
f"{OLLAMA_BASE_URL}/api/generate",
json={
"model": model_info.ollama_name,
"prompt": "test",
"stream": False
},
timeout=MODEL_SWAP_TIMEOUT
)
if response.status_code == 200:
model_info.status = ModelStatus.LOADED
model_info.loaded_at = datetime.now()
model_info.unloaded_at = None
self.active_model = model_name
self.model_load_times[model_name] = datetime.now()
logger.info(f"✅ Model loaded: {model_name}")
return True
else:
model_info.status = ModelStatus.ERROR
logger.error(f"❌ Failed to load model: {model_name}")
return False
except Exception as e:
logger.error(f"❌ Error loading model {model_name}: {e}", exc_info=True)
if model_name in self.models:
self.models[model_name].status = ModelStatus.ERROR
return False
async def _unload_model_internal(self, model_name: str) -> bool:
"""Internal method to unload a model"""
try:
if model_name not in self.models:
return False
model_info = self.models[model_name]
if model_info.status == ModelStatus.LOADED:
logger.info(f"🔄 Unloading model: {model_name}")
model_info.status = ModelStatus.UNLOADING
# Calculate uptime
if model_name in self.model_load_times:
load_time = self.model_load_times[model_name]
uptime_seconds = (datetime.now() - load_time).total_seconds()
self.model_uptime[model_name] = self.model_uptime.get(model_name, 0.0) + uptime_seconds
model_info.total_uptime_seconds = self.model_uptime[model_name]
del self.model_load_times[model_name]
model_info.status = ModelStatus.UNLOADED
model_info.unloaded_at = datetime.now()
if self.active_model == model_name:
self.active_model = None
logger.info(f"✅ Model unloaded: {model_name}")
return True
except Exception as e:
logger.error(f"❌ Error unloading model {model_name}: {e}")
return False
async def unload_model(self, model_name: str) -> bool:
"""Unload a model"""
async with self.loading_lock:
return await self._unload_model_internal(model_name)
async def get_status(self) -> SwapperStatus:
"""Get Swapper service status"""
# Update uptime for currently loaded model
if self.active_model and self.active_model in self.model_load_times:
load_time = self.model_load_times[self.active_model]
current_uptime = (datetime.now() - load_time).total_seconds()
self.model_uptime[self.active_model] = self.model_uptime.get(self.active_model, 0.0) + current_uptime
self.model_load_times[self.active_model] = datetime.now() # Reset timer
loaded_models = [
name for name, model in self.models.items()
if model.status == ModelStatus.LOADED
]
return SwapperStatus(
status="healthy",
active_model=self.active_model,
available_models=list(self.models.keys()),
loaded_models=loaded_models,
mode=SWAPPER_MODE,
total_models=len(self.models)
)
async def get_model_metrics(self, model_name: Optional[str] = None) -> List[ModelMetrics]:
"""Get metrics for model(s)"""
metrics = []
models_to_check = [model_name] if model_name else list(self.models.keys())
for name in models_to_check:
if name not in self.models:
continue
model_info = self.models[name]
# Calculate current uptime
uptime_seconds = self.model_uptime.get(name, 0.0)
if name in self.model_load_times:
load_time = self.model_load_times[name]
current_uptime = (datetime.now() - load_time).total_seconds()
uptime_seconds += current_uptime
uptime_hours = uptime_seconds / 3600.0
metrics.append(ModelMetrics(
model_name=name,
status=model_info.status.value,
loaded_at=model_info.loaded_at,
uptime_hours=uptime_hours,
request_count=model_info.request_count,
total_uptime_seconds=uptime_seconds
))
return metrics
async def close(self):
"""Close HTTP client"""
await self.http_client.aclose()
# ========== FastAPI App ==========
app = FastAPI(
title="Swapper Service",
description="Dynamic model loading service for Node #2",
version="1.0.0"
)
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
# Include cabinet API router (import after swapper is created)
try:
from app.cabinet_api import router as cabinet_router
app.include_router(cabinet_router)
logger.info("✅ Cabinet API router included")
except ImportError:
logger.warning("⚠️ cabinet_api module not found, skipping cabinet router")
# Global Swapper instance
swapper = SwapperService()
@app.on_event("startup")
async def startup():
"""Initialize Swapper on startup"""
await swapper.initialize()
@app.on_event("shutdown")
async def shutdown():
"""Close Swapper on shutdown"""
await swapper.close()
# ========== API Endpoints ==========
@app.get("/health")
async def health():
"""Health check endpoint"""
status = await swapper.get_status()
return {
"status": "healthy",
"service": "swapper-service",
"active_model": status.active_model,
"mode": status.mode
}
@app.get("/status", response_model=SwapperStatus)
async def get_status():
"""Get Swapper service status"""
return await swapper.get_status()
@app.get("/models")
async def list_models():
"""List all available models"""
return {
"models": [
{
"name": model.name,
"ollama_name": model.ollama_name,
"type": model.type,
"size_gb": model.size_gb,
"priority": model.priority,
"status": model.status.value
}
for model in swapper.models.values()
]
}
@app.get("/models/{model_name}")
async def get_model_info(model_name: str):
"""Get information about a specific model"""
if model_name not in swapper.models:
raise HTTPException(status_code=404, detail=f"Model not found: {model_name}")
model_info = swapper.models[model_name]
return {
"name": model_info.name,
"ollama_name": model_info.ollama_name,
"type": model_info.type,
"size_gb": model_info.size_gb,
"priority": model_info.priority,
"status": model_info.status.value,
"loaded_at": model_info.loaded_at.isoformat() if model_info.loaded_at else None,
"unloaded_at": model_info.unloaded_at.isoformat() if model_info.unloaded_at else None,
"total_uptime_seconds": swapper.model_uptime.get(model_name, 0.0)
}
@app.post("/models/{model_name}/load")
async def load_model_endpoint(model_name: str):
"""Load a model"""
success = await swapper.load_model(model_name)
if success:
return {"status": "success", "model": model_name, "message": f"Model {model_name} loaded"}
raise HTTPException(status_code=500, detail=f"Failed to load model: {model_name}")
@app.post("/models/{model_name}/unload")
async def unload_model_endpoint(model_name: str):
"""Unload a model"""
success = await swapper.unload_model(model_name)
if success:
return {"status": "success", "model": model_name, "message": f"Model {model_name} unloaded"}
raise HTTPException(status_code=500, detail=f"Failed to unload model: {model_name}")
@app.get("/metrics")
async def get_metrics(model_name: Optional[str] = None):
"""Get metrics for model(s)"""
metrics = await swapper.get_model_metrics(model_name)
return {
"metrics": [metric.dict() for metric in metrics]
}
@app.get("/metrics/{model_name}")
async def get_model_metrics(model_name: str):
"""Get metrics for a specific model"""
metrics = await swapper.get_model_metrics(model_name)
if not metrics:
raise HTTPException(status_code=404, detail=f"Model not found: {model_name}")
return metrics[0].dict()
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8890)

View File

@@ -0,0 +1,393 @@
/* Swapper Service Cabinet Integration Styles */
.swapper-status-card {
background: white;
border-radius: 8px;
padding: 24px;
box-shadow: 0 2px 8px rgba(0, 0, 0, 0.1);
margin-bottom: 24px;
}
.swapper-header {
display: flex;
justify-content: space-between;
align-items: center;
margin-bottom: 20px;
padding-bottom: 16px;
border-bottom: 2px solid #f0f0f0;
}
.swapper-header h3 {
margin: 0;
font-size: 24px;
font-weight: 600;
color: #1a1a1a;
}
.status-badge {
padding: 6px 12px;
border-radius: 6px;
font-size: 12px;
font-weight: 600;
text-transform: uppercase;
}
.status-healthy {
background: #4caf50;
color: white;
}
.status-degraded {
background: #ff9800;
color: white;
}
.status-unhealthy {
background: #f44336;
color: white;
}
.swapper-info {
display: grid;
grid-template-columns: repeat(3, 1fr);
gap: 16px;
margin-bottom: 24px;
}
.info-row {
display: flex;
flex-direction: column;
gap: 4px;
}
.info-row span:first-child {
font-size: 12px;
color: #666;
text-transform: uppercase;
letter-spacing: 0.5px;
}
.info-row span:last-child {
font-size: 18px;
font-weight: 600;
color: #1a1a1a;
}
.active-model-card {
background: linear-gradient(135deg, #e8f5e9 0%, #c8e6c9 100%);
border-radius: 8px;
padding: 20px;
margin-bottom: 24px;
border-left: 4px solid #4caf50;
}
.active-model-card h4 {
margin: 0 0 12px 0;
font-size: 16px;
color: #2e7d32;
}
.model-details {
display: flex;
flex-direction: column;
gap: 12px;
}
.model-name {
font-size: 20px;
font-weight: 600;
color: #1a1a1a;
}
.model-stats {
display: flex;
gap: 24px;
flex-wrap: wrap;
}
.stat {
display: flex;
flex-direction: column;
gap: 4px;
}
.stat-label {
font-size: 12px;
color: #666;
text-transform: uppercase;
}
.stat-value {
font-size: 16px;
font-weight: 600;
color: #1a1a1a;
}
.models-list {
margin-top: 24px;
}
.models-list h4 {
margin: 0 0 16px 0;
font-size: 18px;
color: #1a1a1a;
}
.models-table {
width: 100%;
border-collapse: collapse;
font-size: 14px;
}
.models-table thead {
background: #f5f5f5;
border-bottom: 2px solid #e0e0e0;
}
.models-table th {
padding: 12px;
text-align: left;
font-weight: 600;
color: #666;
text-transform: uppercase;
font-size: 11px;
letter-spacing: 0.5px;
}
.models-table td {
padding: 12px;
border-bottom: 1px solid #f0f0f0;
}
.models-table tr:hover {
background: #fafafa;
}
.models-table tr.active {
background: #fff3e0;
border-left: 3px solid #ff9800;
}
.model-type {
padding: 4px 8px;
border-radius: 4px;
font-size: 11px;
font-weight: 600;
text-transform: uppercase;
}
.type-llm {
background: #e3f2fd;
color: #1976d2;
}
.type-code {
background: #f3e5f5;
color: #7b1fa2;
}
.type-vision {
background: #e8f5e9;
color: #388e3c;
}
.type-math {
background: #fff3e0;
color: #f57c00;
}
.btn-load,
.btn-unload {
padding: 6px 12px;
border: none;
border-radius: 4px;
font-size: 12px;
font-weight: 600;
cursor: pointer;
transition: all 0.2s;
}
.btn-load {
background: #4caf50;
color: white;
}
.btn-load:hover {
background: #45a049;
}
.btn-unload {
background: #f44336;
color: white;
}
.btn-unload:hover {
background: #da190b;
}
.active-indicator {
color: #4caf50;
font-weight: 600;
font-size: 12px;
}
.swapper-footer {
margin-top: 20px;
padding-top: 16px;
border-top: 1px solid #f0f0f0;
text-align: center;
}
.swapper-footer small {
color: #999;
font-size: 12px;
}
/* Metrics Summary */
.swapper-metrics {
background: white;
border-radius: 8px;
padding: 24px;
box-shadow: 0 2px 8px rgba(0, 0, 0, 0.1);
}
.swapper-metrics h4 {
margin: 0 0 20px 0;
font-size: 18px;
color: #1a1a1a;
}
.metrics-grid {
display: grid;
grid-template-columns: repeat(2, 1fr);
gap: 16px;
margin-bottom: 24px;
}
.metric-card {
background: #f5f5f5;
border-radius: 6px;
padding: 16px;
text-align: center;
}
.metric-label {
font-size: 12px;
color: #666;
text-transform: uppercase;
margin-bottom: 8px;
}
.metric-value {
font-size: 24px;
font-weight: 600;
color: #1a1a1a;
}
.most-used-model {
background: #f5f5f5;
border-radius: 6px;
padding: 16px;
margin-top: 16px;
}
.most-used-model h5 {
margin: 0 0 12px 0;
font-size: 14px;
color: #666;
text-transform: uppercase;
}
.model-info {
display: flex;
justify-content: space-between;
align-items: center;
}
.model-name {
font-weight: 600;
color: #1a1a1a;
}
.model-uptime {
color: #666;
font-size: 14px;
}
/* Page Layout */
.swapper-page {
max-width: 1400px;
margin: 0 auto;
padding: 24px;
}
.page-header {
margin-bottom: 32px;
}
.page-header h2 {
margin: 0 0 8px 0;
font-size: 32px;
color: #1a1a1a;
}
.page-header p {
margin: 0;
color: #666;
font-size: 16px;
}
.swapper-grid {
display: grid;
grid-template-columns: 2fr 1fr;
gap: 24px;
}
.swapper-main {
min-width: 0;
}
.swapper-sidebar {
min-width: 0;
}
/* Loading and Error States */
.swapper-loading,
.swapper-error {
padding: 24px;
text-align: center;
background: white;
border-radius: 8px;
box-shadow: 0 2px 8px rgba(0, 0, 0, 0.1);
}
.swapper-error {
color: #f44336;
}
/* Responsive */
@media (max-width: 1024px) {
.swapper-grid {
grid-template-columns: 1fr;
}
}
@media (max-width: 768px) {
.swapper-info {
grid-template-columns: 1fr;
}
.metrics-grid {
grid-template-columns: 1fr;
}
.models-table {
font-size: 12px;
}
.models-table th,
.models-table td {
padding: 8px;
}
}

View File

@@ -0,0 +1,311 @@
/**
* Swapper Service Integration for Node #1 and Node #2 Admin Consoles
* React/TypeScript component example
*/
import React, { useEffect, useState } from 'react';
// Types
interface SwapperStatus {
service: string;
status: string;
mode: string;
active_model: {
name: string;
uptime_hours: number;
request_count: number;
loaded_at: string | null;
} | null;
total_models: number;
available_models: string[];
loaded_models: string[];
models: Array<{
name: string;
ollama_name: string;
type: string;
size_gb: number;
priority: string;
status: string;
is_active: boolean;
uptime_hours: number;
request_count: number;
total_uptime_seconds: number;
}>;
timestamp: string;
}
interface SwapperMetrics {
summary: {
total_models: number;
active_models: number;
available_models: number;
total_uptime_hours: number;
total_requests: number;
};
most_used_model: {
name: string;
uptime_hours: number;
request_count: number;
} | null;
active_model: {
name: string;
uptime_hours: number | null;
} | null;
timestamp: string;
}
// API Service
const SWAPPER_API_BASE = process.env.NEXT_PUBLIC_SWAPPER_URL || 'http://localhost:8890';
export const swapperService = {
async getStatus(): Promise<SwapperStatus> {
const response = await fetch(`${SWAPPER_API_BASE}/api/cabinet/swapper/status`);
if (!response.ok) throw new Error('Failed to fetch Swapper status');
return response.json();
},
async getMetrics(): Promise<SwapperMetrics> {
const response = await fetch(`${SWAPPER_API_BASE}/api/cabinet/swapper/metrics/summary`);
if (!response.ok) throw new Error('Failed to fetch Swapper metrics');
return response.json();
},
async loadModel(modelName: string): Promise<void> {
const response = await fetch(`${SWAPPER_API_BASE}/models/${modelName}/load`, {
method: 'POST',
});
if (!response.ok) throw new Error(`Failed to load model: ${modelName}`);
},
async unloadModel(modelName: string): Promise<void> {
const response = await fetch(`${SWAPPER_API_BASE}/models/${modelName}/unload`, {
method: 'POST',
});
if (!response.ok) throw new Error(`Failed to unload model: ${modelName}`);
},
};
// Main Swapper Status Component
export const SwapperStatusCard: React.FC = () => {
const [status, setStatus] = useState<SwapperStatus | null>(null);
const [loading, setLoading] = useState(true);
const [error, setError] = useState<string | null>(null);
const fetchStatus = async () => {
try {
const data = await swapperService.getStatus();
setStatus(data);
setError(null);
} catch (err) {
setError(err instanceof Error ? err.message : 'Unknown error');
} finally {
setLoading(false);
}
};
useEffect(() => {
fetchStatus();
const interval = setInterval(fetchStatus, 30000); // Update every 30 seconds
return () => clearInterval(interval);
}, []);
if (loading) return <div className="swapper-loading">Loading Swapper status...</div>;
if (error) return <div className="swapper-error">Error: {error}</div>;
if (!status) return <div className="swapper-error">No status data</div>;
return (
<div className="swapper-status-card">
<div className="swapper-header">
<h3>🔄 Swapper Service</h3>
<span className={`status-badge status-${status.status}`}>
{status.status}
</span>
</div>
<div className="swapper-info">
<div className="info-row">
<span>Mode:</span>
<span>{status.mode}</span>
</div>
<div className="info-row">
<span>Total Models:</span>
<span>{status.total_models}</span>
</div>
<div className="info-row">
<span>Loaded Models:</span>
<span>{status.loaded_models.length}</span>
</div>
</div>
{status.active_model && (
<div className="active-model-card">
<h4> Active Model</h4>
<div className="model-details">
<div className="model-name">{status.active_model.name}</div>
<div className="model-stats">
<div className="stat">
<span className="stat-label">Uptime:</span>
<span className="stat-value">{status.active_model.uptime_hours.toFixed(2)}h</span>
</div>
<div className="stat">
<span className="stat-label">Requests:</span>
<span className="stat-value">{status.active_model.request_count}</span>
</div>
{status.active_model.loaded_at && (
<div className="stat">
<span className="stat-label">Loaded:</span>
<span className="stat-value">
{new Date(status.active_model.loaded_at).toLocaleString()}
</span>
</div>
)}
</div>
</div>
</div>
)}
<div className="models-list">
<h4>Available Models</h4>
<table className="models-table">
<thead>
<tr>
<th>Name</th>
<th>Type</th>
<th>Size (GB)</th>
<th>Status</th>
<th>Uptime (h)</th>
<th>Actions</th>
</tr>
</thead>
<tbody>
{status.models.map((model) => (
<tr key={model.name} className={model.is_active ? 'active' : ''}>
<td>{model.name}</td>
<td>
<span className={`model-type type-${model.type}`}>{model.type}</span>
</td>
<td>{model.size_gb.toFixed(1)}</td>
<td>
<span className={`status-badge status-${model.status}`}>
{model.status}
</span>
</td>
<td>{model.uptime_hours.toFixed(2)}</td>
<td>
{model.status === 'unloaded' && (
<button
className="btn-load"
onClick={() => swapperService.loadModel(model.name).then(fetchStatus)}
>
Load
</button>
)}
{model.status === 'loaded' && !model.is_active && (
<button
className="btn-unload"
onClick={() => swapperService.unloadModel(model.name).then(fetchStatus)}
>
Unload
</button>
)}
{model.is_active && (
<span className="active-indicator"> Active</span>
)}
</td>
</tr>
))}
</tbody>
</table>
</div>
<div className="swapper-footer">
<small>Last updated: {new Date(status.timestamp).toLocaleString()}</small>
</div>
</div>
);
};
// Metrics Summary Component
export const SwapperMetricsSummary: React.FC = () => {
const [metrics, setMetrics] = useState<SwapperMetrics | null>(null);
const [loading, setLoading] = useState(true);
useEffect(() => {
const fetchMetrics = async () => {
try {
const data = await swapperService.getMetrics();
setMetrics(data);
} catch (err) {
console.error('Error fetching metrics:', err);
} finally {
setLoading(false);
}
};
fetchMetrics();
const interval = setInterval(fetchMetrics, 60000); // Update every minute
return () => clearInterval(interval);
}, []);
if (loading || !metrics) return <div>Loading metrics...</div>;
return (
<div className="swapper-metrics">
<h4>📊 Metrics Summary</h4>
<div className="metrics-grid">
<div className="metric-card">
<div className="metric-label">Total Models</div>
<div className="metric-value">{metrics.summary.total_models}</div>
</div>
<div className="metric-card">
<div className="metric-label">Active Models</div>
<div className="metric-value">{metrics.summary.active_models}</div>
</div>
<div className="metric-card">
<div className="metric-label">Total Uptime</div>
<div className="metric-value">{metrics.summary.total_uptime_hours.toFixed(2)}h</div>
</div>
<div className="metric-card">
<div className="metric-label">Total Requests</div>
<div className="metric-value">{metrics.summary.total_requests}</div>
</div>
</div>
{metrics.most_used_model && (
<div className="most-used-model">
<h5>Most Used Model</h5>
<div className="model-info">
<span className="model-name">{metrics.most_used_model.name}</span>
<span className="model-uptime">
{metrics.most_used_model.uptime_hours.toFixed(2)}h
</span>
</div>
</div>
)}
</div>
);
};
// Main Swapper Page Component
export const SwapperPage: React.FC = () => {
return (
<div className="swapper-page">
<div className="page-header">
<h2>Swapper Service</h2>
<p>Dynamic model loading and management</p>
</div>
<div className="swapper-grid">
<div className="swapper-main">
<SwapperStatusCard />
</div>
<div className="swapper-sidebar">
<SwapperMetricsSummary />
</div>
</div>
</div>
);
};
export default SwapperPage;

View File

@@ -0,0 +1,81 @@
# Swapper Configuration for Node #1 (Production Server)
# Single-active LLM scheduler
# Hetzner GEX44 - NVIDIA RTX 4000 SFF Ada (20GB VRAM)
# Auto-generated configuration with all available Ollama models
swapper:
mode: single-active
max_concurrent_models: 1
model_swap_timeout: 300
gpu_enabled: true
metal_acceleration: false # NVIDIA GPU, not Apple Silicon
# Модель для автоматичного завантаження при старті (опціонально)
# Якщо не вказано - моделі завантажуються тільки за запитом
# Рекомендовано: qwen3-8b (основна модель) або qwen2.5-3b-instruct (легка модель)
default_model: qwen3-8b # Модель активується автоматично при старті
models:
# Primary LLM - Qwen3 8B (High Priority) - Main model from INFRASTRUCTURE.md
qwen3-8b:
path: ollama:qwen3:8b
type: llm
size_gb: 4.87
priority: high
description: "Primary LLM for general tasks and conversations"
# Vision Model - Qwen3-VL 8B (High Priority) - For image processing
qwen3-vl-8b:
path: ollama:qwen3-vl:8b
type: vision
size_gb: 5.72
priority: high
description: "Vision model for image understanding and processing"
# Qwen2.5 7B Instruct (High Priority)
qwen2.5-7b-instruct:
path: ollama:qwen2.5:7b-instruct-q4_K_M
type: llm
size_gb: 4.36
priority: high
description: "Qwen2.5 7B Instruct model"
# Lightweight LLM - Qwen2.5 3B Instruct (Medium Priority)
qwen2.5-3b-instruct:
path: ollama:qwen2.5:3b-instruct-q4_K_M
type: llm
size_gb: 1.80
priority: medium
description: "Lightweight LLM for faster responses"
# Math Specialist - Qwen2 Math 7B (High Priority)
qwen2-math-7b:
path: ollama:qwen2-math:7b
type: math
size_gb: 4.13
priority: high
description: "Specialized model for mathematical tasks"
# Lightweight conversational LLM - Mistral Nemo 2.3B (Medium Priority)
mistral-nemo-2_3b:
path: ollama:mistral-nemo:2.3b-instruct
type: llm
size_gb: 1.60
priority: medium
description: "Fast low-cost replies for monitor/service agents"
# Compact Math Specialist - Qwen2.5 Math 1.5B (Medium Priority)
qwen2_5-math-1_5b:
path: ollama:qwen2.5-math:1.5b
type: math
size_gb: 1.20
priority: medium
description: "Lightweight math model for DRUID/Nutra micro-calculations"
storage:
models_dir: /app/models
cache_dir: /app/cache
swap_dir: /app/swap
ollama:
url: http://ollama:11434 # From Docker container to Ollama service
timeout: 300

View File

@@ -0,0 +1,64 @@
# Swapper Configuration for Node #1 (Production Server)
# Single-active LLM scheduler
# Hetzner GEX44 - NVIDIA RTX 4000 SFF Ada (20GB VRAM)
swapper:
mode: single-active
max_concurrent_models: 1
model_swap_timeout: 300
gpu_enabled: true
metal_acceleration: false # NVIDIA GPU, not Apple Silicon
# Модель для автоматичного завантаження при старті
# qwen3-8b - основна модель (4.87 GB), швидка відповідь на перший запит
default_model: qwen3-8b
models:
# Primary LLM - Qwen3 8B (High Priority) - Main model from INFRASTRUCTURE.md
qwen3-8b:
path: ollama:qwen3:8b
type: llm
size_gb: 4.87
priority: high
description: "Primary LLM for general tasks and conversations"
# Vision Model - Qwen3-VL 8B (High Priority) - For image processing
qwen3-vl-8b:
path: ollama:qwen3-vl:8b
type: vision
size_gb: 5.72
priority: high
description: "Vision model for image understanding and processing"
# Qwen2.5 7B Instruct (High Priority)
qwen2.5-7b-instruct:
path: ollama:qwen2.5:7b-instruct-q4_K_M
type: llm
size_gb: 4.36
priority: high
description: "Qwen2.5 7B Instruct model"
# Lightweight LLM - Qwen2.5 3B Instruct (Medium Priority)
qwen2.5-3b-instruct:
path: ollama:qwen2.5:3b-instruct-q4_K_M
type: llm
size_gb: 1.80
priority: medium
description: "Lightweight LLM for faster responses"
# Math Specialist - Qwen2 Math 7B (High Priority)
qwen2-math-7b:
path: ollama:qwen2-math:7b
type: math
size_gb: 4.13
priority: high
description: "Specialized model for mathematical tasks"
storage:
models_dir: /app/models
cache_dir: /app/cache
swap_dir: /app/swap
ollama:
url: http://ollama:11434 # From Docker container to Ollama service
timeout: 300

View File

@@ -0,0 +1,90 @@
# Swapper Configuration for Node #2 (Development Node)
# Single-active LLM scheduler
# MacBook Pro M4 Max - Apple Silicon (40-core GPU, 64GB RAM)
# Auto-generated configuration with available Ollama models
swapper:
mode: single-active
max_concurrent_models: 1
model_swap_timeout: 300
gpu_enabled: true
metal_acceleration: true # Apple Silicon GPU acceleration
# Модель для автоматичного завантаження при старті (опціонально)
# Якщо не вказано - моделі завантажуються тільки за запитом
# Рекомендовано: gpt-oss:latest (швидка модель) або phi3:latest (легка модель)
default_model: gpt-oss:latest # Модель активується автоматично при старті
models:
# Fast LLM - GPT-OSS 20B (High Priority) - Main model for general tasks
gpt-oss-latest:
path: ollama:gpt-oss:latest
type: llm
size_gb: 13.0
priority: high
description: "Fast LLM for general tasks and conversations (20.9B params)"
# Lightweight LLM - Phi3 3.8B (High Priority) - Fast responses
phi3-latest:
path: ollama:phi3:latest
type: llm
size_gb: 2.2
priority: high
description: "Lightweight LLM for fast responses (3.8B params)"
# Code Specialist - StarCoder2 3B (Medium Priority) - Code engineering
starcoder2-3b:
path: ollama:starcoder2:3b
type: code
size_gb: 1.7
priority: medium
description: "Code specialist model for code engineering (3B params)"
# Reasoning Model - Mistral Nemo 12.2B (High Priority) - Advanced reasoning
mistral-nemo-12b:
path: ollama:mistral-nemo:12b
type: llm
size_gb: 7.1
priority: high
description: "Advanced reasoning model for complex tasks (12.2B params)"
# Reasoning Model - Gemma2 27B (Medium Priority) - Strategic reasoning
gemma2-27b:
path: ollama:gemma2:27b
type: llm
size_gb: 15.0
priority: medium
description: "Reasoning model for strategic tasks (27.2B params)"
# Code Specialist - DeepSeek Coder 33B (High Priority) - Advanced code tasks
deepseek-coder-33b:
path: ollama:deepseek-coder:33b
type: code
size_gb: 18.0
priority: high
description: "Advanced code specialist model (33B params)"
# Code Specialist - Qwen2.5 Coder 32B (High Priority) - Advanced code tasks
qwen2.5-coder-32b:
path: ollama:qwen2.5-coder:32b
type: code
size_gb: 19.0
priority: high
description: "Advanced code specialist model (32.8B params)"
# Reasoning Model - DeepSeek R1 70B (High Priority) - Strategic reasoning (large model)
deepseek-r1-70b:
path: ollama:deepseek-r1:70b
type: llm
size_gb: 42.0
priority: high
description: "Strategic reasoning model (70.6B params, quantized)"
storage:
models_dir: /app/models
cache_dir: /app/cache
swap_dir: /app/swap
ollama:
url: http://localhost:11434 # Native Ollama on MacBook (via Pieces OS or brew)
timeout: 300

View File

@@ -0,0 +1,7 @@
fastapi==0.104.1
uvicorn[standard]==0.24.0
httpx==0.25.2
pydantic==2.5.0
pyyaml==6.0.1
python-multipart==0.0.6

View File

@@ -0,0 +1,38 @@
#!/bin/bash
# Start Swapper Service locally
set -e
echo "🚀 Starting Swapper Service..."
# Check if virtual environment exists
if [ ! -d "venv" ]; then
echo "📦 Creating virtual environment..."
python3 -m venv venv
fi
# Activate virtual environment
source venv/bin/activate
# Install dependencies
echo "📥 Installing dependencies..."
pip install -q --upgrade pip
pip install -q -r requirements.txt
# Set environment variables
export OLLAMA_BASE_URL=${OLLAMA_BASE_URL:-http://localhost:11434}
export SWAPPER_CONFIG_PATH=${SWAPPER_CONFIG_PATH:-./config/swapper_config.yaml}
export SWAPPER_MODE=${SWAPPER_MODE:-single-active}
export MAX_CONCURRENT_MODELS=${MAX_CONCURRENT_MODELS:-1}
export MODEL_SWAP_TIMEOUT=${MODEL_SWAP_TIMEOUT:-30}
# Start service
echo "✅ Starting Swapper Service on port 8890..."
echo " Health: http://localhost:8890/health"
echo " Status: http://localhost:8890/status"
echo " Cabinet API: http://localhost:8890/api/cabinet/swapper/status"
echo ""
echo "Press Ctrl+C to stop"
python3 -m uvicorn app.main:app --host 0.0.0.0 --port 8890