feat: Add presence heartbeat for Matrix online status
- matrix-gateway: POST /internal/matrix/presence/online endpoint - usePresenceHeartbeat hook with activity tracking - Auto away after 5 min inactivity - Offline on page close/visibility change - Integrated in MatrixChatRoom component
This commit is contained in:
13
services/swapper-service/Dockerfile
Normal file
13
services/swapper-service/Dockerfile
Normal file
@@ -0,0 +1,13 @@
|
||||
FROM python:3.11-slim
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
COPY requirements.txt .
|
||||
RUN pip install --no-cache-dir -r requirements.txt
|
||||
|
||||
COPY app/ ./app/
|
||||
|
||||
EXPOSE 8890
|
||||
|
||||
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8890"]
|
||||
|
||||
353
services/swapper-service/README.md
Normal file
353
services/swapper-service/README.md
Normal file
@@ -0,0 +1,353 @@
|
||||
# Swapper Service
|
||||
|
||||
**Version:** 1.0.0
|
||||
**Status:** ✅ Ready for Node #2
|
||||
**Port:** 8890
|
||||
|
||||
Dynamic model loading service that manages LLM models on-demand to optimize memory usage. Supports single-active mode (one model loaded at a time).
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Swapper Service provides:
|
||||
- **Dynamic Model Loading** — Load/unload models on-demand
|
||||
- **Single-Active Mode** — Only one model loaded at a time (memory optimization)
|
||||
- **Model Metrics** — Track uptime, request count, load/unload times
|
||||
- **Ollama Integration** — Works with Ollama models
|
||||
- **REST API** — Full API for model management
|
||||
|
||||
---
|
||||
|
||||
## Features
|
||||
|
||||
### Model Management
|
||||
- Load models on-demand
|
||||
- Unload models to free memory
|
||||
- Track which model is currently active
|
||||
- Monitor model uptime and usage
|
||||
|
||||
### Metrics
|
||||
- Current active model
|
||||
- Model uptime (hours)
|
||||
- Request count per model
|
||||
- Load/unload timestamps
|
||||
- Total uptime per model
|
||||
|
||||
### Single-Active Mode
|
||||
- Only one model loaded at a time
|
||||
- Automatic unloading of previous model when loading new one
|
||||
- Optimizes memory usage on resource-constrained systems
|
||||
|
||||
---
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Docker (Recommended)
|
||||
|
||||
```bash
|
||||
# Build and start
|
||||
docker-compose up -d swapper-service
|
||||
|
||||
# Check health
|
||||
curl http://localhost:8890/health
|
||||
|
||||
# Get status
|
||||
curl http://localhost:8890/status
|
||||
|
||||
# List models
|
||||
curl http://localhost:8890/models
|
||||
```
|
||||
|
||||
### Local Development
|
||||
|
||||
```bash
|
||||
cd services/swapper-service
|
||||
|
||||
# Install dependencies
|
||||
pip install -r requirements.txt
|
||||
|
||||
# Set environment variables
|
||||
export OLLAMA_BASE_URL=http://localhost:11434
|
||||
export SWAPPER_CONFIG_PATH=./config/swapper_config.yaml
|
||||
|
||||
# Run service
|
||||
python -m app.main
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### Health & Status
|
||||
|
||||
#### GET /health
|
||||
Health check endpoint
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"status": "healthy",
|
||||
"service": "swapper-service",
|
||||
"active_model": "deepseek-r1-70b",
|
||||
"mode": "single-active"
|
||||
}
|
||||
```
|
||||
|
||||
#### GET /status
|
||||
Get full Swapper service status
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"status": "healthy",
|
||||
"active_model": "deepseek-r1-70b",
|
||||
"available_models": ["deepseek-r1-70b", "qwen2.5-coder-32b", ...],
|
||||
"loaded_models": ["deepseek-r1-70b"],
|
||||
"mode": "single-active",
|
||||
"total_models": 8
|
||||
}
|
||||
```
|
||||
|
||||
### Model Management
|
||||
|
||||
#### GET /models
|
||||
List all available models
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"models": [
|
||||
{
|
||||
"name": "deepseek-r1-70b",
|
||||
"ollama_name": "deepseek-r1:70b",
|
||||
"type": "llm",
|
||||
"size_gb": 42,
|
||||
"priority": "high",
|
||||
"status": "loaded"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
#### GET /models/{model_name}
|
||||
Get information about a specific model
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"name": "deepseek-r1-70b",
|
||||
"ollama_name": "deepseek-r1:70b",
|
||||
"type": "llm",
|
||||
"size_gb": 42,
|
||||
"priority": "high",
|
||||
"status": "loaded",
|
||||
"loaded_at": "2025-11-22T10:30:00",
|
||||
"unloaded_at": null,
|
||||
"total_uptime_seconds": 3600.5
|
||||
}
|
||||
```
|
||||
|
||||
#### POST /models/{model_name}/load
|
||||
Load a model
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"status": "success",
|
||||
"model": "deepseek-r1-70b",
|
||||
"message": "Model deepseek-r1-70b loaded"
|
||||
}
|
||||
```
|
||||
|
||||
#### POST /models/{model_name}/unload
|
||||
Unload a model
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"status": "success",
|
||||
"model": "deepseek-r1-70b",
|
||||
"message": "Model deepseek-r1-70b unloaded"
|
||||
}
|
||||
```
|
||||
|
||||
### Metrics
|
||||
|
||||
#### GET /metrics
|
||||
Get metrics for all models
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"metrics": [
|
||||
{
|
||||
"model_name": "deepseek-r1-70b",
|
||||
"status": "loaded",
|
||||
"loaded_at": "2025-11-22T10:30:00",
|
||||
"uptime_hours": 1.5,
|
||||
"request_count": 42,
|
||||
"total_uptime_seconds": 5400.0
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
#### GET /metrics/{model_name}
|
||||
Get metrics for a specific model
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"model_name": "deepseek-r1-70b",
|
||||
"status": "loaded",
|
||||
"loaded_at": "2025-11-22T10:30:00",
|
||||
"uptime_hours": 1.5,
|
||||
"request_count": 42,
|
||||
"total_uptime_seconds": 5400.0
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
### Environment Variables
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `OLLAMA_BASE_URL` | `http://localhost:11434` | Ollama API URL |
|
||||
| `SWAPPER_CONFIG_PATH` | `./config/swapper_config.yaml` | Path to config file |
|
||||
| `SWAPPER_MODE` | `single-active` | Mode: `single-active` or `multi-active` |
|
||||
| `MAX_CONCURRENT_MODELS` | `1` | Max concurrent models (for multi-active mode) |
|
||||
| `MODEL_SWAP_TIMEOUT` | `30` | Timeout for model swap (seconds) |
|
||||
|
||||
### Config File (swapper_config.yaml)
|
||||
|
||||
```yaml
|
||||
swapper:
|
||||
mode: single-active
|
||||
max_concurrent_models: 1
|
||||
model_swap_timeout: 30
|
||||
gpu_enabled: true
|
||||
metal_acceleration: true
|
||||
|
||||
models:
|
||||
deepseek-r1-70b:
|
||||
path: ollama:deepseek-r1:70b
|
||||
type: llm
|
||||
size_gb: 42
|
||||
priority: high
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Integration with Router
|
||||
|
||||
Swapper Service integrates with DAGI Router through metadata:
|
||||
|
||||
```python
|
||||
router_request = {
|
||||
"message": "Your request",
|
||||
"mode": "chat",
|
||||
"metadata": {
|
||||
"use_llm": "specialist_vision_8b", # Swapper will load this model
|
||||
"swapper_service": "http://swapper-service:8890"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Monitoring
|
||||
|
||||
### Health Check
|
||||
```bash
|
||||
curl http://localhost:8890/health
|
||||
```
|
||||
|
||||
### Prometheus Metrics (Future)
|
||||
- `swapper_active_model` — Currently active model
|
||||
- `swapper_model_uptime_seconds` — Uptime per model
|
||||
- `swapper_model_requests_total` — Total requests per model
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Model won't load
|
||||
```bash
|
||||
# Check Ollama is running
|
||||
curl http://localhost:11434/api/tags
|
||||
|
||||
# Check model exists in Ollama
|
||||
curl http://localhost:11434/api/tags | grep "model_name"
|
||||
|
||||
# Check Swapper logs
|
||||
docker logs swapper-service
|
||||
```
|
||||
|
||||
### Service not responding
|
||||
```bash
|
||||
# Check if service is running
|
||||
docker ps | grep swapper-service
|
||||
|
||||
# Check health
|
||||
curl http://localhost:8890/health
|
||||
|
||||
# Check logs
|
||||
docker logs -f swapper-service
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Differences: Swapper Service vs vLLM
|
||||
|
||||
**Swapper Service:**
|
||||
- Model loading/unloading manager
|
||||
- Single-active mode (one model at a time)
|
||||
- Memory optimization
|
||||
- Works with Ollama
|
||||
- Lightweight, simple API
|
||||
|
||||
**vLLM:**
|
||||
- High-performance inference engine
|
||||
- Continuous serving (models stay loaded)
|
||||
- Optimized for throughput
|
||||
- Direct GPU acceleration
|
||||
- More complex, production-grade
|
||||
|
||||
**Use Swapper when:**
|
||||
- Memory is limited
|
||||
- Need to switch between models frequently
|
||||
- Running on resource-constrained systems (like Node #2 MacBook)
|
||||
|
||||
**Use vLLM when:**
|
||||
- Need maximum throughput
|
||||
- Models stay loaded for long periods
|
||||
- Have dedicated GPU resources
|
||||
- Production serving at scale
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Add to Node #2 Admin Console**
|
||||
- Display active model
|
||||
- Show model metrics (uptime, requests)
|
||||
- Allow manual model loading/unloading
|
||||
|
||||
2. **Integration with Router**
|
||||
- Auto-load models based on request type
|
||||
- Route requests to appropriate models
|
||||
|
||||
3. **Metrics Dashboard**
|
||||
- Grafana dashboard for Swapper metrics
|
||||
- Model usage analytics
|
||||
|
||||
---
|
||||
|
||||
**Last Updated:** 2025-11-22
|
||||
**Maintained by:** Ivan Tytar & DAARION Team
|
||||
**Status:** ✅ Ready for Node #2
|
||||
|
||||
2
services/swapper-service/app/__init__.py
Normal file
2
services/swapper-service/app/__init__.py
Normal file
@@ -0,0 +1,2 @@
|
||||
# Swapper Service App Package
|
||||
|
||||
168
services/swapper-service/app/cabinet_api.py
Normal file
168
services/swapper-service/app/cabinet_api.py
Normal file
@@ -0,0 +1,168 @@
|
||||
"""
|
||||
Cabinet API endpoints for Swapper Service
|
||||
Provides data for Node #1 and Node #2 admin consoles
|
||||
"""
|
||||
|
||||
from fastapi import APIRouter, HTTPException
|
||||
from typing import Dict, Any, List
|
||||
from datetime import datetime
|
||||
|
||||
# Import will be done after swapper is initialized
|
||||
|
||||
router = APIRouter(prefix="/api/cabinet", tags=["cabinet"])
|
||||
|
||||
def get_swapper():
|
||||
"""Get swapper instance (lazy import to avoid circular dependency)"""
|
||||
from app.main import swapper
|
||||
return swapper
|
||||
|
||||
@router.get("/swapper/status")
|
||||
async def get_swapper_status_for_cabinet() -> Dict[str, Any]:
|
||||
"""
|
||||
Get Swapper Service status for admin console display
|
||||
Returns data formatted for Node #1 and Node #2 cabinets
|
||||
"""
|
||||
try:
|
||||
swapper = get_swapper()
|
||||
status = await swapper.get_status()
|
||||
metrics = await swapper.get_model_metrics()
|
||||
|
||||
# Format active model info
|
||||
active_model_info = None
|
||||
if status.active_model:
|
||||
active_metrics = next(
|
||||
(m for m in metrics if m.model_name == status.active_model),
|
||||
None
|
||||
)
|
||||
if active_metrics:
|
||||
active_model_info = {
|
||||
"name": status.active_model,
|
||||
"uptime_hours": round(active_metrics.uptime_hours, 2),
|
||||
"request_count": active_metrics.request_count,
|
||||
"loaded_at": active_metrics.loaded_at.isoformat() if active_metrics.loaded_at else None
|
||||
}
|
||||
|
||||
# Format all models with their status
|
||||
swapper = get_swapper()
|
||||
models_info = []
|
||||
for model_name in status.available_models:
|
||||
model_metrics = next(
|
||||
(m for m in metrics if m.model_name == model_name),
|
||||
None
|
||||
)
|
||||
model_data = swapper.models.get(model_name)
|
||||
|
||||
if model_data:
|
||||
models_info.append({
|
||||
"name": model_name,
|
||||
"ollama_name": model_data.ollama_name,
|
||||
"type": model_data.type,
|
||||
"size_gb": model_data.size_gb,
|
||||
"priority": model_data.priority,
|
||||
"status": model_data.status.value,
|
||||
"is_active": model_name == status.active_model,
|
||||
"uptime_hours": round(model_metrics.uptime_hours, 2) if model_metrics else 0.0,
|
||||
"request_count": model_metrics.request_count if model_metrics else 0,
|
||||
"total_uptime_seconds": model_metrics.total_uptime_seconds if model_metrics else 0.0
|
||||
})
|
||||
|
||||
return {
|
||||
"service": "swapper-service",
|
||||
"status": status.status,
|
||||
"mode": status.mode,
|
||||
"active_model": active_model_info,
|
||||
"total_models": status.total_models,
|
||||
"available_models": status.available_models,
|
||||
"loaded_models": status.loaded_models,
|
||||
"models": models_info,
|
||||
"timestamp": datetime.now().isoformat()
|
||||
}
|
||||
except Exception as e:
|
||||
raise HTTPException(status_code=500, detail=f"Error getting Swapper status: {str(e)}")
|
||||
|
||||
@router.get("/swapper/models")
|
||||
async def get_swapper_models_for_cabinet() -> Dict[str, Any]:
|
||||
"""
|
||||
Get all models with detailed information for cabinet display
|
||||
"""
|
||||
try:
|
||||
swapper = get_swapper()
|
||||
status = await swapper.get_status()
|
||||
metrics = await swapper.get_model_metrics()
|
||||
|
||||
models_detail = []
|
||||
for model_name in status.available_models:
|
||||
model_data = swapper.models.get(model_name)
|
||||
model_metrics = next(
|
||||
(m for m in metrics if m.model_name == model_name),
|
||||
None
|
||||
)
|
||||
|
||||
if model_data:
|
||||
models_detail.append({
|
||||
"name": model_name,
|
||||
"ollama_name": model_data.ollama_name,
|
||||
"type": model_data.type,
|
||||
"size_gb": model_data.size_gb,
|
||||
"priority": model_data.priority,
|
||||
"status": model_data.status.value,
|
||||
"is_active": model_name == status.active_model,
|
||||
"can_load": model_data.status.value in ["unloaded", "error"],
|
||||
"can_unload": model_data.status.value == "loaded",
|
||||
"uptime_hours": round(model_metrics.uptime_hours, 2) if model_metrics else 0.0,
|
||||
"request_count": model_metrics.request_count if model_metrics else 0,
|
||||
"total_uptime_seconds": model_metrics.total_uptime_seconds if model_metrics else 0.0,
|
||||
"loaded_at": model_metrics.loaded_at.isoformat() if model_metrics and model_metrics.loaded_at else None
|
||||
})
|
||||
|
||||
return {
|
||||
"models": models_detail,
|
||||
"total": len(models_detail),
|
||||
"active_count": len(status.loaded_models),
|
||||
"timestamp": datetime.now().isoformat()
|
||||
}
|
||||
except Exception as e:
|
||||
raise HTTPException(status_code=500, detail=f"Error getting models: {str(e)}")
|
||||
|
||||
@router.get("/swapper/metrics/summary")
|
||||
async def get_swapper_metrics_summary() -> Dict[str, Any]:
|
||||
"""
|
||||
Get summary metrics for cabinet dashboard
|
||||
"""
|
||||
try:
|
||||
swapper = get_swapper()
|
||||
status = await swapper.get_status()
|
||||
metrics = await swapper.get_model_metrics()
|
||||
|
||||
# Calculate totals
|
||||
total_uptime_hours = sum(m.uptime_hours for m in metrics)
|
||||
total_requests = sum(m.request_count for m in metrics)
|
||||
|
||||
# Most used model
|
||||
most_used = max(metrics, key=lambda m: m.total_uptime_seconds) if metrics else None
|
||||
|
||||
return {
|
||||
"summary": {
|
||||
"total_models": status.total_models,
|
||||
"active_models": len(status.loaded_models),
|
||||
"available_models": len(status.available_models),
|
||||
"total_uptime_hours": round(total_uptime_hours, 2),
|
||||
"total_requests": total_requests
|
||||
},
|
||||
"most_used_model": {
|
||||
"name": most_used.model_name,
|
||||
"uptime_hours": round(most_used.uptime_hours, 2),
|
||||
"request_count": most_used.request_count
|
||||
} if most_used else None,
|
||||
"active_model": {
|
||||
"name": status.active_model,
|
||||
"uptime_hours": round(
|
||||
next((m.uptime_hours for m in metrics if m.model_name == status.active_model), 0.0),
|
||||
2
|
||||
) if status.active_model else None
|
||||
} if status.active_model else None,
|
||||
"timestamp": datetime.now().isoformat()
|
||||
}
|
||||
except Exception as e:
|
||||
raise HTTPException(status_code=500, detail=f"Error getting metrics summary: {str(e)}")
|
||||
|
||||
437
services/swapper-service/app/main.py
Normal file
437
services/swapper-service/app/main.py
Normal file
@@ -0,0 +1,437 @@
|
||||
"""
|
||||
Swapper Service - Dynamic Model Loading Service
|
||||
Manages loading/unloading LLM models on-demand to optimize memory usage.
|
||||
Supports single-active model mode (one model loaded at a time).
|
||||
"""
|
||||
|
||||
import os
|
||||
import asyncio
|
||||
import logging
|
||||
from typing import Optional, Dict, List, Any
|
||||
from datetime import datetime, timedelta
|
||||
from enum import Enum
|
||||
|
||||
from fastapi import FastAPI, HTTPException, BackgroundTasks
|
||||
from fastapi.middleware.cors import CORSMiddleware
|
||||
from pydantic import BaseModel
|
||||
import httpx
|
||||
import yaml
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# ========== Configuration ==========
|
||||
|
||||
OLLAMA_BASE_URL = os.getenv("OLLAMA_BASE_URL", "http://localhost:11434")
|
||||
SWAPPER_CONFIG_PATH = os.getenv("SWAPPER_CONFIG_PATH", "./config/swapper_config.yaml")
|
||||
SWAPPER_MODE = os.getenv("SWAPPER_MODE", "single-active") # single-active or multi-active
|
||||
MAX_CONCURRENT_MODELS = int(os.getenv("MAX_CONCURRENT_MODELS", "1"))
|
||||
MODEL_SWAP_TIMEOUT = int(os.getenv("MODEL_SWAP_TIMEOUT", "30"))
|
||||
|
||||
# ========== Models ==========
|
||||
|
||||
class ModelStatus(str, Enum):
|
||||
"""Model status"""
|
||||
LOADED = "loaded"
|
||||
LOADING = "loading"
|
||||
UNLOADED = "unloaded"
|
||||
UNLOADING = "unloading"
|
||||
ERROR = "error"
|
||||
|
||||
class ModelInfo(BaseModel):
|
||||
"""Model information"""
|
||||
name: str
|
||||
ollama_name: str
|
||||
type: str # llm, code, vision, math
|
||||
size_gb: float
|
||||
priority: str # high, medium, low
|
||||
status: ModelStatus
|
||||
loaded_at: Optional[datetime] = None
|
||||
unloaded_at: Optional[datetime] = None
|
||||
total_uptime_seconds: float = 0.0
|
||||
request_count: int = 0
|
||||
|
||||
class SwapperStatus(BaseModel):
|
||||
"""Swapper service status"""
|
||||
status: str
|
||||
active_model: Optional[str] = None
|
||||
available_models: List[str]
|
||||
loaded_models: List[str]
|
||||
mode: str
|
||||
total_models: int
|
||||
|
||||
class ModelMetrics(BaseModel):
|
||||
"""Model usage metrics"""
|
||||
model_name: str
|
||||
status: str
|
||||
loaded_at: Optional[datetime] = None
|
||||
uptime_hours: float
|
||||
request_count: int
|
||||
total_uptime_seconds: float
|
||||
|
||||
# ========== Swapper Service ==========
|
||||
|
||||
class SwapperService:
|
||||
"""Swapper Service - manages model loading/unloading"""
|
||||
|
||||
def __init__(self):
|
||||
self.models: Dict[str, ModelInfo] = {}
|
||||
self.active_model: Optional[str] = None
|
||||
self.loading_lock = asyncio.Lock()
|
||||
self.http_client = httpx.AsyncClient(timeout=300.0)
|
||||
self.model_uptime: Dict[str, float] = {} # Track uptime per model
|
||||
self.model_load_times: Dict[str, datetime] = {} # Track when model was loaded
|
||||
|
||||
async def initialize(self):
|
||||
"""Initialize Swapper Service - load configuration"""
|
||||
config = None
|
||||
try:
|
||||
logger.info(f"🔧 Initializing Swapper Service...")
|
||||
logger.info(f"🔧 Config path: {SWAPPER_CONFIG_PATH}")
|
||||
logger.info(f"🔧 Config exists: {os.path.exists(SWAPPER_CONFIG_PATH)}")
|
||||
|
||||
if os.path.exists(SWAPPER_CONFIG_PATH):
|
||||
with open(SWAPPER_CONFIG_PATH, 'r') as f:
|
||||
config = yaml.safe_load(f)
|
||||
models_config = config.get('models', {})
|
||||
logger.info(f"🔧 Found {len(models_config)} models in config")
|
||||
|
||||
for model_key, model_config in models_config.items():
|
||||
ollama_name = model_config.get('path', '').replace('ollama:', '')
|
||||
logger.info(f"🔧 Adding model: {model_key} -> {ollama_name}")
|
||||
self.models[model_key] = ModelInfo(
|
||||
name=model_key,
|
||||
ollama_name=ollama_name,
|
||||
type=model_config.get('type', 'llm'),
|
||||
size_gb=model_config.get('size_gb', 0),
|
||||
priority=model_config.get('priority', 'medium'),
|
||||
status=ModelStatus.UNLOADED
|
||||
)
|
||||
self.model_uptime[model_key] = 0.0
|
||||
logger.info(f"✅ Loaded {len(self.models)} models into Swapper")
|
||||
else:
|
||||
logger.warning(f"⚠️ Config file not found: {SWAPPER_CONFIG_PATH}, using defaults")
|
||||
# Load default models from Ollama
|
||||
await self._load_models_from_ollama()
|
||||
|
||||
logger.info(f"✅ Swapper Service initialized with {len(self.models)} models")
|
||||
logger.info(f"✅ Model names: {list(self.models.keys())}")
|
||||
|
||||
# Завантажити модель за замовчанням, якщо вказано в конфігурації
|
||||
if config:
|
||||
swapper_config = config.get('swapper', {})
|
||||
default_model = swapper_config.get('default_model')
|
||||
|
||||
if default_model and default_model in self.models:
|
||||
logger.info(f"🔄 Loading default model: {default_model}")
|
||||
success = await self.load_model(default_model)
|
||||
if success:
|
||||
logger.info(f"✅ Default model loaded: {default_model}")
|
||||
else:
|
||||
logger.warning(f"⚠️ Failed to load default model: {default_model}")
|
||||
elif default_model:
|
||||
logger.warning(f"⚠️ Default model '{default_model}' not found in models list")
|
||||
except Exception as e:
|
||||
logger.error(f"❌ Error initializing Swapper Service: {e}", exc_info=True)
|
||||
import traceback
|
||||
logger.error(f"❌ Traceback: {traceback.format_exc()}")
|
||||
|
||||
async def _load_models_from_ollama(self):
|
||||
"""Load available models from Ollama"""
|
||||
try:
|
||||
response = await self.http_client.get(f"{OLLAMA_BASE_URL}/api/tags")
|
||||
if response.status_code == 200:
|
||||
data = response.json()
|
||||
for model in data.get('models', []):
|
||||
model_name = model.get('name', '')
|
||||
# Extract base name (remove :latest, :7b, etc.)
|
||||
base_name = model_name.split(':')[0]
|
||||
|
||||
if base_name not in self.models:
|
||||
size_gb = model.get('size', 0) / (1024**3) # Convert bytes to GB
|
||||
self.models[base_name] = ModelInfo(
|
||||
name=base_name,
|
||||
ollama_name=model_name,
|
||||
type='llm', # Default type
|
||||
size_gb=size_gb,
|
||||
priority='medium',
|
||||
status=ModelStatus.UNLOADED
|
||||
)
|
||||
self.model_uptime[base_name] = 0.0
|
||||
|
||||
logger.info(f"✅ Loaded {len(self.models)} models from Ollama")
|
||||
except Exception as e:
|
||||
logger.error(f"❌ Error loading models from Ollama: {e}")
|
||||
|
||||
async def load_model(self, model_name: str) -> bool:
|
||||
"""Load a model (unload current if in single-active mode)"""
|
||||
async with self.loading_lock:
|
||||
try:
|
||||
# Check if model exists
|
||||
if model_name not in self.models:
|
||||
logger.error(f"❌ Model not found: {model_name}")
|
||||
return False
|
||||
|
||||
model_info = self.models[model_name]
|
||||
|
||||
# If single-active mode and another model is loaded, unload it first
|
||||
if SWAPPER_MODE == "single-active" and self.active_model and self.active_model != model_name:
|
||||
await self._unload_model_internal(self.active_model)
|
||||
|
||||
# Load the model
|
||||
logger.info(f"🔄 Loading model: {model_name}")
|
||||
model_info.status = ModelStatus.LOADING
|
||||
|
||||
# Check if model is already loaded in Ollama
|
||||
response = await self.http_client.post(
|
||||
f"{OLLAMA_BASE_URL}/api/generate",
|
||||
json={
|
||||
"model": model_info.ollama_name,
|
||||
"prompt": "test",
|
||||
"stream": False
|
||||
},
|
||||
timeout=MODEL_SWAP_TIMEOUT
|
||||
)
|
||||
|
||||
if response.status_code == 200:
|
||||
model_info.status = ModelStatus.LOADED
|
||||
model_info.loaded_at = datetime.now()
|
||||
model_info.unloaded_at = None
|
||||
self.active_model = model_name
|
||||
self.model_load_times[model_name] = datetime.now()
|
||||
logger.info(f"✅ Model loaded: {model_name}")
|
||||
return True
|
||||
else:
|
||||
model_info.status = ModelStatus.ERROR
|
||||
logger.error(f"❌ Failed to load model: {model_name}")
|
||||
return False
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"❌ Error loading model {model_name}: {e}", exc_info=True)
|
||||
if model_name in self.models:
|
||||
self.models[model_name].status = ModelStatus.ERROR
|
||||
return False
|
||||
|
||||
async def _unload_model_internal(self, model_name: str) -> bool:
|
||||
"""Internal method to unload a model"""
|
||||
try:
|
||||
if model_name not in self.models:
|
||||
return False
|
||||
|
||||
model_info = self.models[model_name]
|
||||
|
||||
if model_info.status == ModelStatus.LOADED:
|
||||
logger.info(f"🔄 Unloading model: {model_name}")
|
||||
model_info.status = ModelStatus.UNLOADING
|
||||
|
||||
# Calculate uptime
|
||||
if model_name in self.model_load_times:
|
||||
load_time = self.model_load_times[model_name]
|
||||
uptime_seconds = (datetime.now() - load_time).total_seconds()
|
||||
self.model_uptime[model_name] = self.model_uptime.get(model_name, 0.0) + uptime_seconds
|
||||
model_info.total_uptime_seconds = self.model_uptime[model_name]
|
||||
del self.model_load_times[model_name]
|
||||
|
||||
model_info.status = ModelStatus.UNLOADED
|
||||
model_info.unloaded_at = datetime.now()
|
||||
|
||||
if self.active_model == model_name:
|
||||
self.active_model = None
|
||||
|
||||
logger.info(f"✅ Model unloaded: {model_name}")
|
||||
return True
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"❌ Error unloading model {model_name}: {e}")
|
||||
return False
|
||||
|
||||
async def unload_model(self, model_name: str) -> bool:
|
||||
"""Unload a model"""
|
||||
async with self.loading_lock:
|
||||
return await self._unload_model_internal(model_name)
|
||||
|
||||
async def get_status(self) -> SwapperStatus:
|
||||
"""Get Swapper service status"""
|
||||
# Update uptime for currently loaded model
|
||||
if self.active_model and self.active_model in self.model_load_times:
|
||||
load_time = self.model_load_times[self.active_model]
|
||||
current_uptime = (datetime.now() - load_time).total_seconds()
|
||||
self.model_uptime[self.active_model] = self.model_uptime.get(self.active_model, 0.0) + current_uptime
|
||||
self.model_load_times[self.active_model] = datetime.now() # Reset timer
|
||||
|
||||
loaded_models = [
|
||||
name for name, model in self.models.items()
|
||||
if model.status == ModelStatus.LOADED
|
||||
]
|
||||
|
||||
return SwapperStatus(
|
||||
status="healthy",
|
||||
active_model=self.active_model,
|
||||
available_models=list(self.models.keys()),
|
||||
loaded_models=loaded_models,
|
||||
mode=SWAPPER_MODE,
|
||||
total_models=len(self.models)
|
||||
)
|
||||
|
||||
async def get_model_metrics(self, model_name: Optional[str] = None) -> List[ModelMetrics]:
|
||||
"""Get metrics for model(s)"""
|
||||
metrics = []
|
||||
|
||||
models_to_check = [model_name] if model_name else list(self.models.keys())
|
||||
|
||||
for name in models_to_check:
|
||||
if name not in self.models:
|
||||
continue
|
||||
|
||||
model_info = self.models[name]
|
||||
|
||||
# Calculate current uptime
|
||||
uptime_seconds = self.model_uptime.get(name, 0.0)
|
||||
if name in self.model_load_times:
|
||||
load_time = self.model_load_times[name]
|
||||
current_uptime = (datetime.now() - load_time).total_seconds()
|
||||
uptime_seconds += current_uptime
|
||||
|
||||
uptime_hours = uptime_seconds / 3600.0
|
||||
|
||||
metrics.append(ModelMetrics(
|
||||
model_name=name,
|
||||
status=model_info.status.value,
|
||||
loaded_at=model_info.loaded_at,
|
||||
uptime_hours=uptime_hours,
|
||||
request_count=model_info.request_count,
|
||||
total_uptime_seconds=uptime_seconds
|
||||
))
|
||||
|
||||
return metrics
|
||||
|
||||
async def close(self):
|
||||
"""Close HTTP client"""
|
||||
await self.http_client.aclose()
|
||||
|
||||
# ========== FastAPI App ==========
|
||||
|
||||
app = FastAPI(
|
||||
title="Swapper Service",
|
||||
description="Dynamic model loading service for Node #2",
|
||||
version="1.0.0"
|
||||
)
|
||||
|
||||
app.add_middleware(
|
||||
CORSMiddleware,
|
||||
allow_origins=["*"],
|
||||
allow_credentials=True,
|
||||
allow_methods=["*"],
|
||||
allow_headers=["*"],
|
||||
)
|
||||
|
||||
# Include cabinet API router (import after swapper is created)
|
||||
try:
|
||||
from app.cabinet_api import router as cabinet_router
|
||||
app.include_router(cabinet_router)
|
||||
logger.info("✅ Cabinet API router included")
|
||||
except ImportError:
|
||||
logger.warning("⚠️ cabinet_api module not found, skipping cabinet router")
|
||||
|
||||
# Global Swapper instance
|
||||
swapper = SwapperService()
|
||||
|
||||
@app.on_event("startup")
|
||||
async def startup():
|
||||
"""Initialize Swapper on startup"""
|
||||
await swapper.initialize()
|
||||
|
||||
@app.on_event("shutdown")
|
||||
async def shutdown():
|
||||
"""Close Swapper on shutdown"""
|
||||
await swapper.close()
|
||||
|
||||
# ========== API Endpoints ==========
|
||||
|
||||
@app.get("/health")
|
||||
async def health():
|
||||
"""Health check endpoint"""
|
||||
status = await swapper.get_status()
|
||||
return {
|
||||
"status": "healthy",
|
||||
"service": "swapper-service",
|
||||
"active_model": status.active_model,
|
||||
"mode": status.mode
|
||||
}
|
||||
|
||||
@app.get("/status", response_model=SwapperStatus)
|
||||
async def get_status():
|
||||
"""Get Swapper service status"""
|
||||
return await swapper.get_status()
|
||||
|
||||
@app.get("/models")
|
||||
async def list_models():
|
||||
"""List all available models"""
|
||||
return {
|
||||
"models": [
|
||||
{
|
||||
"name": model.name,
|
||||
"ollama_name": model.ollama_name,
|
||||
"type": model.type,
|
||||
"size_gb": model.size_gb,
|
||||
"priority": model.priority,
|
||||
"status": model.status.value
|
||||
}
|
||||
for model in swapper.models.values()
|
||||
]
|
||||
}
|
||||
|
||||
@app.get("/models/{model_name}")
|
||||
async def get_model_info(model_name: str):
|
||||
"""Get information about a specific model"""
|
||||
if model_name not in swapper.models:
|
||||
raise HTTPException(status_code=404, detail=f"Model not found: {model_name}")
|
||||
|
||||
model_info = swapper.models[model_name]
|
||||
return {
|
||||
"name": model_info.name,
|
||||
"ollama_name": model_info.ollama_name,
|
||||
"type": model_info.type,
|
||||
"size_gb": model_info.size_gb,
|
||||
"priority": model_info.priority,
|
||||
"status": model_info.status.value,
|
||||
"loaded_at": model_info.loaded_at.isoformat() if model_info.loaded_at else None,
|
||||
"unloaded_at": model_info.unloaded_at.isoformat() if model_info.unloaded_at else None,
|
||||
"total_uptime_seconds": swapper.model_uptime.get(model_name, 0.0)
|
||||
}
|
||||
|
||||
@app.post("/models/{model_name}/load")
|
||||
async def load_model_endpoint(model_name: str):
|
||||
"""Load a model"""
|
||||
success = await swapper.load_model(model_name)
|
||||
if success:
|
||||
return {"status": "success", "model": model_name, "message": f"Model {model_name} loaded"}
|
||||
raise HTTPException(status_code=500, detail=f"Failed to load model: {model_name}")
|
||||
|
||||
@app.post("/models/{model_name}/unload")
|
||||
async def unload_model_endpoint(model_name: str):
|
||||
"""Unload a model"""
|
||||
success = await swapper.unload_model(model_name)
|
||||
if success:
|
||||
return {"status": "success", "model": model_name, "message": f"Model {model_name} unloaded"}
|
||||
raise HTTPException(status_code=500, detail=f"Failed to unload model: {model_name}")
|
||||
|
||||
@app.get("/metrics")
|
||||
async def get_metrics(model_name: Optional[str] = None):
|
||||
"""Get metrics for model(s)"""
|
||||
metrics = await swapper.get_model_metrics(model_name)
|
||||
return {
|
||||
"metrics": [metric.dict() for metric in metrics]
|
||||
}
|
||||
|
||||
@app.get("/metrics/{model_name}")
|
||||
async def get_model_metrics(model_name: str):
|
||||
"""Get metrics for a specific model"""
|
||||
metrics = await swapper.get_model_metrics(model_name)
|
||||
if not metrics:
|
||||
raise HTTPException(status_code=404, detail=f"Model not found: {model_name}")
|
||||
return metrics[0].dict()
|
||||
|
||||
if __name__ == "__main__":
|
||||
import uvicorn
|
||||
uvicorn.run(app, host="0.0.0.0", port=8890)
|
||||
|
||||
393
services/swapper-service/cabinet-integration.css
Normal file
393
services/swapper-service/cabinet-integration.css
Normal file
@@ -0,0 +1,393 @@
|
||||
/* Swapper Service Cabinet Integration Styles */
|
||||
|
||||
.swapper-status-card {
|
||||
background: white;
|
||||
border-radius: 8px;
|
||||
padding: 24px;
|
||||
box-shadow: 0 2px 8px rgba(0, 0, 0, 0.1);
|
||||
margin-bottom: 24px;
|
||||
}
|
||||
|
||||
.swapper-header {
|
||||
display: flex;
|
||||
justify-content: space-between;
|
||||
align-items: center;
|
||||
margin-bottom: 20px;
|
||||
padding-bottom: 16px;
|
||||
border-bottom: 2px solid #f0f0f0;
|
||||
}
|
||||
|
||||
.swapper-header h3 {
|
||||
margin: 0;
|
||||
font-size: 24px;
|
||||
font-weight: 600;
|
||||
color: #1a1a1a;
|
||||
}
|
||||
|
||||
.status-badge {
|
||||
padding: 6px 12px;
|
||||
border-radius: 6px;
|
||||
font-size: 12px;
|
||||
font-weight: 600;
|
||||
text-transform: uppercase;
|
||||
}
|
||||
|
||||
.status-healthy {
|
||||
background: #4caf50;
|
||||
color: white;
|
||||
}
|
||||
|
||||
.status-degraded {
|
||||
background: #ff9800;
|
||||
color: white;
|
||||
}
|
||||
|
||||
.status-unhealthy {
|
||||
background: #f44336;
|
||||
color: white;
|
||||
}
|
||||
|
||||
.swapper-info {
|
||||
display: grid;
|
||||
grid-template-columns: repeat(3, 1fr);
|
||||
gap: 16px;
|
||||
margin-bottom: 24px;
|
||||
}
|
||||
|
||||
.info-row {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 4px;
|
||||
}
|
||||
|
||||
.info-row span:first-child {
|
||||
font-size: 12px;
|
||||
color: #666;
|
||||
text-transform: uppercase;
|
||||
letter-spacing: 0.5px;
|
||||
}
|
||||
|
||||
.info-row span:last-child {
|
||||
font-size: 18px;
|
||||
font-weight: 600;
|
||||
color: #1a1a1a;
|
||||
}
|
||||
|
||||
.active-model-card {
|
||||
background: linear-gradient(135deg, #e8f5e9 0%, #c8e6c9 100%);
|
||||
border-radius: 8px;
|
||||
padding: 20px;
|
||||
margin-bottom: 24px;
|
||||
border-left: 4px solid #4caf50;
|
||||
}
|
||||
|
||||
.active-model-card h4 {
|
||||
margin: 0 0 12px 0;
|
||||
font-size: 16px;
|
||||
color: #2e7d32;
|
||||
}
|
||||
|
||||
.model-details {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 12px;
|
||||
}
|
||||
|
||||
.model-name {
|
||||
font-size: 20px;
|
||||
font-weight: 600;
|
||||
color: #1a1a1a;
|
||||
}
|
||||
|
||||
.model-stats {
|
||||
display: flex;
|
||||
gap: 24px;
|
||||
flex-wrap: wrap;
|
||||
}
|
||||
|
||||
.stat {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 4px;
|
||||
}
|
||||
|
||||
.stat-label {
|
||||
font-size: 12px;
|
||||
color: #666;
|
||||
text-transform: uppercase;
|
||||
}
|
||||
|
||||
.stat-value {
|
||||
font-size: 16px;
|
||||
font-weight: 600;
|
||||
color: #1a1a1a;
|
||||
}
|
||||
|
||||
.models-list {
|
||||
margin-top: 24px;
|
||||
}
|
||||
|
||||
.models-list h4 {
|
||||
margin: 0 0 16px 0;
|
||||
font-size: 18px;
|
||||
color: #1a1a1a;
|
||||
}
|
||||
|
||||
.models-table {
|
||||
width: 100%;
|
||||
border-collapse: collapse;
|
||||
font-size: 14px;
|
||||
}
|
||||
|
||||
.models-table thead {
|
||||
background: #f5f5f5;
|
||||
border-bottom: 2px solid #e0e0e0;
|
||||
}
|
||||
|
||||
.models-table th {
|
||||
padding: 12px;
|
||||
text-align: left;
|
||||
font-weight: 600;
|
||||
color: #666;
|
||||
text-transform: uppercase;
|
||||
font-size: 11px;
|
||||
letter-spacing: 0.5px;
|
||||
}
|
||||
|
||||
.models-table td {
|
||||
padding: 12px;
|
||||
border-bottom: 1px solid #f0f0f0;
|
||||
}
|
||||
|
||||
.models-table tr:hover {
|
||||
background: #fafafa;
|
||||
}
|
||||
|
||||
.models-table tr.active {
|
||||
background: #fff3e0;
|
||||
border-left: 3px solid #ff9800;
|
||||
}
|
||||
|
||||
.model-type {
|
||||
padding: 4px 8px;
|
||||
border-radius: 4px;
|
||||
font-size: 11px;
|
||||
font-weight: 600;
|
||||
text-transform: uppercase;
|
||||
}
|
||||
|
||||
.type-llm {
|
||||
background: #e3f2fd;
|
||||
color: #1976d2;
|
||||
}
|
||||
|
||||
.type-code {
|
||||
background: #f3e5f5;
|
||||
color: #7b1fa2;
|
||||
}
|
||||
|
||||
.type-vision {
|
||||
background: #e8f5e9;
|
||||
color: #388e3c;
|
||||
}
|
||||
|
||||
.type-math {
|
||||
background: #fff3e0;
|
||||
color: #f57c00;
|
||||
}
|
||||
|
||||
.btn-load,
|
||||
.btn-unload {
|
||||
padding: 6px 12px;
|
||||
border: none;
|
||||
border-radius: 4px;
|
||||
font-size: 12px;
|
||||
font-weight: 600;
|
||||
cursor: pointer;
|
||||
transition: all 0.2s;
|
||||
}
|
||||
|
||||
.btn-load {
|
||||
background: #4caf50;
|
||||
color: white;
|
||||
}
|
||||
|
||||
.btn-load:hover {
|
||||
background: #45a049;
|
||||
}
|
||||
|
||||
.btn-unload {
|
||||
background: #f44336;
|
||||
color: white;
|
||||
}
|
||||
|
||||
.btn-unload:hover {
|
||||
background: #da190b;
|
||||
}
|
||||
|
||||
.active-indicator {
|
||||
color: #4caf50;
|
||||
font-weight: 600;
|
||||
font-size: 12px;
|
||||
}
|
||||
|
||||
.swapper-footer {
|
||||
margin-top: 20px;
|
||||
padding-top: 16px;
|
||||
border-top: 1px solid #f0f0f0;
|
||||
text-align: center;
|
||||
}
|
||||
|
||||
.swapper-footer small {
|
||||
color: #999;
|
||||
font-size: 12px;
|
||||
}
|
||||
|
||||
/* Metrics Summary */
|
||||
.swapper-metrics {
|
||||
background: white;
|
||||
border-radius: 8px;
|
||||
padding: 24px;
|
||||
box-shadow: 0 2px 8px rgba(0, 0, 0, 0.1);
|
||||
}
|
||||
|
||||
.swapper-metrics h4 {
|
||||
margin: 0 0 20px 0;
|
||||
font-size: 18px;
|
||||
color: #1a1a1a;
|
||||
}
|
||||
|
||||
.metrics-grid {
|
||||
display: grid;
|
||||
grid-template-columns: repeat(2, 1fr);
|
||||
gap: 16px;
|
||||
margin-bottom: 24px;
|
||||
}
|
||||
|
||||
.metric-card {
|
||||
background: #f5f5f5;
|
||||
border-radius: 6px;
|
||||
padding: 16px;
|
||||
text-align: center;
|
||||
}
|
||||
|
||||
.metric-label {
|
||||
font-size: 12px;
|
||||
color: #666;
|
||||
text-transform: uppercase;
|
||||
margin-bottom: 8px;
|
||||
}
|
||||
|
||||
.metric-value {
|
||||
font-size: 24px;
|
||||
font-weight: 600;
|
||||
color: #1a1a1a;
|
||||
}
|
||||
|
||||
.most-used-model {
|
||||
background: #f5f5f5;
|
||||
border-radius: 6px;
|
||||
padding: 16px;
|
||||
margin-top: 16px;
|
||||
}
|
||||
|
||||
.most-used-model h5 {
|
||||
margin: 0 0 12px 0;
|
||||
font-size: 14px;
|
||||
color: #666;
|
||||
text-transform: uppercase;
|
||||
}
|
||||
|
||||
.model-info {
|
||||
display: flex;
|
||||
justify-content: space-between;
|
||||
align-items: center;
|
||||
}
|
||||
|
||||
.model-name {
|
||||
font-weight: 600;
|
||||
color: #1a1a1a;
|
||||
}
|
||||
|
||||
.model-uptime {
|
||||
color: #666;
|
||||
font-size: 14px;
|
||||
}
|
||||
|
||||
/* Page Layout */
|
||||
.swapper-page {
|
||||
max-width: 1400px;
|
||||
margin: 0 auto;
|
||||
padding: 24px;
|
||||
}
|
||||
|
||||
.page-header {
|
||||
margin-bottom: 32px;
|
||||
}
|
||||
|
||||
.page-header h2 {
|
||||
margin: 0 0 8px 0;
|
||||
font-size: 32px;
|
||||
color: #1a1a1a;
|
||||
}
|
||||
|
||||
.page-header p {
|
||||
margin: 0;
|
||||
color: #666;
|
||||
font-size: 16px;
|
||||
}
|
||||
|
||||
.swapper-grid {
|
||||
display: grid;
|
||||
grid-template-columns: 2fr 1fr;
|
||||
gap: 24px;
|
||||
}
|
||||
|
||||
.swapper-main {
|
||||
min-width: 0;
|
||||
}
|
||||
|
||||
.swapper-sidebar {
|
||||
min-width: 0;
|
||||
}
|
||||
|
||||
/* Loading and Error States */
|
||||
.swapper-loading,
|
||||
.swapper-error {
|
||||
padding: 24px;
|
||||
text-align: center;
|
||||
background: white;
|
||||
border-radius: 8px;
|
||||
box-shadow: 0 2px 8px rgba(0, 0, 0, 0.1);
|
||||
}
|
||||
|
||||
.swapper-error {
|
||||
color: #f44336;
|
||||
}
|
||||
|
||||
/* Responsive */
|
||||
@media (max-width: 1024px) {
|
||||
.swapper-grid {
|
||||
grid-template-columns: 1fr;
|
||||
}
|
||||
}
|
||||
|
||||
@media (max-width: 768px) {
|
||||
.swapper-info {
|
||||
grid-template-columns: 1fr;
|
||||
}
|
||||
|
||||
.metrics-grid {
|
||||
grid-template-columns: 1fr;
|
||||
}
|
||||
|
||||
.models-table {
|
||||
font-size: 12px;
|
||||
}
|
||||
|
||||
.models-table th,
|
||||
.models-table td {
|
||||
padding: 8px;
|
||||
}
|
||||
}
|
||||
|
||||
311
services/swapper-service/cabinet-integration.tsx
Normal file
311
services/swapper-service/cabinet-integration.tsx
Normal file
@@ -0,0 +1,311 @@
|
||||
/**
|
||||
* Swapper Service Integration for Node #1 and Node #2 Admin Consoles
|
||||
* React/TypeScript component example
|
||||
*/
|
||||
|
||||
import React, { useEffect, useState } from 'react';
|
||||
|
||||
// Types
|
||||
interface SwapperStatus {
|
||||
service: string;
|
||||
status: string;
|
||||
mode: string;
|
||||
active_model: {
|
||||
name: string;
|
||||
uptime_hours: number;
|
||||
request_count: number;
|
||||
loaded_at: string | null;
|
||||
} | null;
|
||||
total_models: number;
|
||||
available_models: string[];
|
||||
loaded_models: string[];
|
||||
models: Array<{
|
||||
name: string;
|
||||
ollama_name: string;
|
||||
type: string;
|
||||
size_gb: number;
|
||||
priority: string;
|
||||
status: string;
|
||||
is_active: boolean;
|
||||
uptime_hours: number;
|
||||
request_count: number;
|
||||
total_uptime_seconds: number;
|
||||
}>;
|
||||
timestamp: string;
|
||||
}
|
||||
|
||||
interface SwapperMetrics {
|
||||
summary: {
|
||||
total_models: number;
|
||||
active_models: number;
|
||||
available_models: number;
|
||||
total_uptime_hours: number;
|
||||
total_requests: number;
|
||||
};
|
||||
most_used_model: {
|
||||
name: string;
|
||||
uptime_hours: number;
|
||||
request_count: number;
|
||||
} | null;
|
||||
active_model: {
|
||||
name: string;
|
||||
uptime_hours: number | null;
|
||||
} | null;
|
||||
timestamp: string;
|
||||
}
|
||||
|
||||
// API Service
|
||||
const SWAPPER_API_BASE = process.env.NEXT_PUBLIC_SWAPPER_URL || 'http://localhost:8890';
|
||||
|
||||
export const swapperService = {
|
||||
async getStatus(): Promise<SwapperStatus> {
|
||||
const response = await fetch(`${SWAPPER_API_BASE}/api/cabinet/swapper/status`);
|
||||
if (!response.ok) throw new Error('Failed to fetch Swapper status');
|
||||
return response.json();
|
||||
},
|
||||
|
||||
async getMetrics(): Promise<SwapperMetrics> {
|
||||
const response = await fetch(`${SWAPPER_API_BASE}/api/cabinet/swapper/metrics/summary`);
|
||||
if (!response.ok) throw new Error('Failed to fetch Swapper metrics');
|
||||
return response.json();
|
||||
},
|
||||
|
||||
async loadModel(modelName: string): Promise<void> {
|
||||
const response = await fetch(`${SWAPPER_API_BASE}/models/${modelName}/load`, {
|
||||
method: 'POST',
|
||||
});
|
||||
if (!response.ok) throw new Error(`Failed to load model: ${modelName}`);
|
||||
},
|
||||
|
||||
async unloadModel(modelName: string): Promise<void> {
|
||||
const response = await fetch(`${SWAPPER_API_BASE}/models/${modelName}/unload`, {
|
||||
method: 'POST',
|
||||
});
|
||||
if (!response.ok) throw new Error(`Failed to unload model: ${modelName}`);
|
||||
},
|
||||
};
|
||||
|
||||
// Main Swapper Status Component
|
||||
export const SwapperStatusCard: React.FC = () => {
|
||||
const [status, setStatus] = useState<SwapperStatus | null>(null);
|
||||
const [loading, setLoading] = useState(true);
|
||||
const [error, setError] = useState<string | null>(null);
|
||||
|
||||
const fetchStatus = async () => {
|
||||
try {
|
||||
const data = await swapperService.getStatus();
|
||||
setStatus(data);
|
||||
setError(null);
|
||||
} catch (err) {
|
||||
setError(err instanceof Error ? err.message : 'Unknown error');
|
||||
} finally {
|
||||
setLoading(false);
|
||||
}
|
||||
};
|
||||
|
||||
useEffect(() => {
|
||||
fetchStatus();
|
||||
const interval = setInterval(fetchStatus, 30000); // Update every 30 seconds
|
||||
return () => clearInterval(interval);
|
||||
}, []);
|
||||
|
||||
if (loading) return <div className="swapper-loading">Loading Swapper status...</div>;
|
||||
if (error) return <div className="swapper-error">Error: {error}</div>;
|
||||
if (!status) return <div className="swapper-error">No status data</div>;
|
||||
|
||||
return (
|
||||
<div className="swapper-status-card">
|
||||
<div className="swapper-header">
|
||||
<h3>🔄 Swapper Service</h3>
|
||||
<span className={`status-badge status-${status.status}`}>
|
||||
{status.status}
|
||||
</span>
|
||||
</div>
|
||||
|
||||
<div className="swapper-info">
|
||||
<div className="info-row">
|
||||
<span>Mode:</span>
|
||||
<span>{status.mode}</span>
|
||||
</div>
|
||||
<div className="info-row">
|
||||
<span>Total Models:</span>
|
||||
<span>{status.total_models}</span>
|
||||
</div>
|
||||
<div className="info-row">
|
||||
<span>Loaded Models:</span>
|
||||
<span>{status.loaded_models.length}</span>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{status.active_model && (
|
||||
<div className="active-model-card">
|
||||
<h4>✨ Active Model</h4>
|
||||
<div className="model-details">
|
||||
<div className="model-name">{status.active_model.name}</div>
|
||||
<div className="model-stats">
|
||||
<div className="stat">
|
||||
<span className="stat-label">Uptime:</span>
|
||||
<span className="stat-value">{status.active_model.uptime_hours.toFixed(2)}h</span>
|
||||
</div>
|
||||
<div className="stat">
|
||||
<span className="stat-label">Requests:</span>
|
||||
<span className="stat-value">{status.active_model.request_count}</span>
|
||||
</div>
|
||||
{status.active_model.loaded_at && (
|
||||
<div className="stat">
|
||||
<span className="stat-label">Loaded:</span>
|
||||
<span className="stat-value">
|
||||
{new Date(status.active_model.loaded_at).toLocaleString()}
|
||||
</span>
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
)}
|
||||
|
||||
<div className="models-list">
|
||||
<h4>Available Models</h4>
|
||||
<table className="models-table">
|
||||
<thead>
|
||||
<tr>
|
||||
<th>Name</th>
|
||||
<th>Type</th>
|
||||
<th>Size (GB)</th>
|
||||
<th>Status</th>
|
||||
<th>Uptime (h)</th>
|
||||
<th>Actions</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
{status.models.map((model) => (
|
||||
<tr key={model.name} className={model.is_active ? 'active' : ''}>
|
||||
<td>{model.name}</td>
|
||||
<td>
|
||||
<span className={`model-type type-${model.type}`}>{model.type}</span>
|
||||
</td>
|
||||
<td>{model.size_gb.toFixed(1)}</td>
|
||||
<td>
|
||||
<span className={`status-badge status-${model.status}`}>
|
||||
{model.status}
|
||||
</span>
|
||||
</td>
|
||||
<td>{model.uptime_hours.toFixed(2)}</td>
|
||||
<td>
|
||||
{model.status === 'unloaded' && (
|
||||
<button
|
||||
className="btn-load"
|
||||
onClick={() => swapperService.loadModel(model.name).then(fetchStatus)}
|
||||
>
|
||||
Load
|
||||
</button>
|
||||
)}
|
||||
{model.status === 'loaded' && !model.is_active && (
|
||||
<button
|
||||
className="btn-unload"
|
||||
onClick={() => swapperService.unloadModel(model.name).then(fetchStatus)}
|
||||
>
|
||||
Unload
|
||||
</button>
|
||||
)}
|
||||
{model.is_active && (
|
||||
<span className="active-indicator">● Active</span>
|
||||
)}
|
||||
</td>
|
||||
</tr>
|
||||
))}
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
|
||||
<div className="swapper-footer">
|
||||
<small>Last updated: {new Date(status.timestamp).toLocaleString()}</small>
|
||||
</div>
|
||||
</div>
|
||||
);
|
||||
};
|
||||
|
||||
// Metrics Summary Component
|
||||
export const SwapperMetricsSummary: React.FC = () => {
|
||||
const [metrics, setMetrics] = useState<SwapperMetrics | null>(null);
|
||||
const [loading, setLoading] = useState(true);
|
||||
|
||||
useEffect(() => {
|
||||
const fetchMetrics = async () => {
|
||||
try {
|
||||
const data = await swapperService.getMetrics();
|
||||
setMetrics(data);
|
||||
} catch (err) {
|
||||
console.error('Error fetching metrics:', err);
|
||||
} finally {
|
||||
setLoading(false);
|
||||
}
|
||||
};
|
||||
|
||||
fetchMetrics();
|
||||
const interval = setInterval(fetchMetrics, 60000); // Update every minute
|
||||
return () => clearInterval(interval);
|
||||
}, []);
|
||||
|
||||
if (loading || !metrics) return <div>Loading metrics...</div>;
|
||||
|
||||
return (
|
||||
<div className="swapper-metrics">
|
||||
<h4>📊 Metrics Summary</h4>
|
||||
<div className="metrics-grid">
|
||||
<div className="metric-card">
|
||||
<div className="metric-label">Total Models</div>
|
||||
<div className="metric-value">{metrics.summary.total_models}</div>
|
||||
</div>
|
||||
<div className="metric-card">
|
||||
<div className="metric-label">Active Models</div>
|
||||
<div className="metric-value">{metrics.summary.active_models}</div>
|
||||
</div>
|
||||
<div className="metric-card">
|
||||
<div className="metric-label">Total Uptime</div>
|
||||
<div className="metric-value">{metrics.summary.total_uptime_hours.toFixed(2)}h</div>
|
||||
</div>
|
||||
<div className="metric-card">
|
||||
<div className="metric-label">Total Requests</div>
|
||||
<div className="metric-value">{metrics.summary.total_requests}</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{metrics.most_used_model && (
|
||||
<div className="most-used-model">
|
||||
<h5>Most Used Model</h5>
|
||||
<div className="model-info">
|
||||
<span className="model-name">{metrics.most_used_model.name}</span>
|
||||
<span className="model-uptime">
|
||||
{metrics.most_used_model.uptime_hours.toFixed(2)}h
|
||||
</span>
|
||||
</div>
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
);
|
||||
};
|
||||
|
||||
// Main Swapper Page Component
|
||||
export const SwapperPage: React.FC = () => {
|
||||
return (
|
||||
<div className="swapper-page">
|
||||
<div className="page-header">
|
||||
<h2>Swapper Service</h2>
|
||||
<p>Dynamic model loading and management</p>
|
||||
</div>
|
||||
|
||||
<div className="swapper-grid">
|
||||
<div className="swapper-main">
|
||||
<SwapperStatusCard />
|
||||
</div>
|
||||
<div className="swapper-sidebar">
|
||||
<SwapperMetricsSummary />
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
);
|
||||
};
|
||||
|
||||
export default SwapperPage;
|
||||
|
||||
81
services/swapper-service/config/swapper_config.yaml
Normal file
81
services/swapper-service/config/swapper_config.yaml
Normal file
@@ -0,0 +1,81 @@
|
||||
# Swapper Configuration for Node #1 (Production Server)
|
||||
# Single-active LLM scheduler
|
||||
# Hetzner GEX44 - NVIDIA RTX 4000 SFF Ada (20GB VRAM)
|
||||
# Auto-generated configuration with all available Ollama models
|
||||
|
||||
swapper:
|
||||
mode: single-active
|
||||
max_concurrent_models: 1
|
||||
model_swap_timeout: 300
|
||||
gpu_enabled: true
|
||||
metal_acceleration: false # NVIDIA GPU, not Apple Silicon
|
||||
# Модель для автоматичного завантаження при старті (опціонально)
|
||||
# Якщо не вказано - моделі завантажуються тільки за запитом
|
||||
# Рекомендовано: qwen3-8b (основна модель) або qwen2.5-3b-instruct (легка модель)
|
||||
default_model: qwen3-8b # Модель активується автоматично при старті
|
||||
|
||||
models:
|
||||
# Primary LLM - Qwen3 8B (High Priority) - Main model from INFRASTRUCTURE.md
|
||||
qwen3-8b:
|
||||
path: ollama:qwen3:8b
|
||||
type: llm
|
||||
size_gb: 4.87
|
||||
priority: high
|
||||
description: "Primary LLM for general tasks and conversations"
|
||||
|
||||
# Vision Model - Qwen3-VL 8B (High Priority) - For image processing
|
||||
qwen3-vl-8b:
|
||||
path: ollama:qwen3-vl:8b
|
||||
type: vision
|
||||
size_gb: 5.72
|
||||
priority: high
|
||||
description: "Vision model for image understanding and processing"
|
||||
|
||||
# Qwen2.5 7B Instruct (High Priority)
|
||||
qwen2.5-7b-instruct:
|
||||
path: ollama:qwen2.5:7b-instruct-q4_K_M
|
||||
type: llm
|
||||
size_gb: 4.36
|
||||
priority: high
|
||||
description: "Qwen2.5 7B Instruct model"
|
||||
|
||||
# Lightweight LLM - Qwen2.5 3B Instruct (Medium Priority)
|
||||
qwen2.5-3b-instruct:
|
||||
path: ollama:qwen2.5:3b-instruct-q4_K_M
|
||||
type: llm
|
||||
size_gb: 1.80
|
||||
priority: medium
|
||||
description: "Lightweight LLM for faster responses"
|
||||
|
||||
# Math Specialist - Qwen2 Math 7B (High Priority)
|
||||
qwen2-math-7b:
|
||||
path: ollama:qwen2-math:7b
|
||||
type: math
|
||||
size_gb: 4.13
|
||||
priority: high
|
||||
description: "Specialized model for mathematical tasks"
|
||||
|
||||
# Lightweight conversational LLM - Mistral Nemo 2.3B (Medium Priority)
|
||||
mistral-nemo-2_3b:
|
||||
path: ollama:mistral-nemo:2.3b-instruct
|
||||
type: llm
|
||||
size_gb: 1.60
|
||||
priority: medium
|
||||
description: "Fast low-cost replies for monitor/service agents"
|
||||
|
||||
# Compact Math Specialist - Qwen2.5 Math 1.5B (Medium Priority)
|
||||
qwen2_5-math-1_5b:
|
||||
path: ollama:qwen2.5-math:1.5b
|
||||
type: math
|
||||
size_gb: 1.20
|
||||
priority: medium
|
||||
description: "Lightweight math model for DRUID/Nutra micro-calculations"
|
||||
|
||||
storage:
|
||||
models_dir: /app/models
|
||||
cache_dir: /app/cache
|
||||
swap_dir: /app/swap
|
||||
|
||||
ollama:
|
||||
url: http://ollama:11434 # From Docker container to Ollama service
|
||||
timeout: 300
|
||||
64
services/swapper-service/config/swapper_config_node1.yaml
Normal file
64
services/swapper-service/config/swapper_config_node1.yaml
Normal file
@@ -0,0 +1,64 @@
|
||||
# Swapper Configuration for Node #1 (Production Server)
|
||||
# Single-active LLM scheduler
|
||||
# Hetzner GEX44 - NVIDIA RTX 4000 SFF Ada (20GB VRAM)
|
||||
|
||||
swapper:
|
||||
mode: single-active
|
||||
max_concurrent_models: 1
|
||||
model_swap_timeout: 300
|
||||
gpu_enabled: true
|
||||
metal_acceleration: false # NVIDIA GPU, not Apple Silicon
|
||||
# Модель для автоматичного завантаження при старті
|
||||
# qwen3-8b - основна модель (4.87 GB), швидка відповідь на перший запит
|
||||
default_model: qwen3-8b
|
||||
|
||||
models:
|
||||
# Primary LLM - Qwen3 8B (High Priority) - Main model from INFRASTRUCTURE.md
|
||||
qwen3-8b:
|
||||
path: ollama:qwen3:8b
|
||||
type: llm
|
||||
size_gb: 4.87
|
||||
priority: high
|
||||
description: "Primary LLM for general tasks and conversations"
|
||||
|
||||
# Vision Model - Qwen3-VL 8B (High Priority) - For image processing
|
||||
qwen3-vl-8b:
|
||||
path: ollama:qwen3-vl:8b
|
||||
type: vision
|
||||
size_gb: 5.72
|
||||
priority: high
|
||||
description: "Vision model for image understanding and processing"
|
||||
|
||||
# Qwen2.5 7B Instruct (High Priority)
|
||||
qwen2.5-7b-instruct:
|
||||
path: ollama:qwen2.5:7b-instruct-q4_K_M
|
||||
type: llm
|
||||
size_gb: 4.36
|
||||
priority: high
|
||||
description: "Qwen2.5 7B Instruct model"
|
||||
|
||||
# Lightweight LLM - Qwen2.5 3B Instruct (Medium Priority)
|
||||
qwen2.5-3b-instruct:
|
||||
path: ollama:qwen2.5:3b-instruct-q4_K_M
|
||||
type: llm
|
||||
size_gb: 1.80
|
||||
priority: medium
|
||||
description: "Lightweight LLM for faster responses"
|
||||
|
||||
# Math Specialist - Qwen2 Math 7B (High Priority)
|
||||
qwen2-math-7b:
|
||||
path: ollama:qwen2-math:7b
|
||||
type: math
|
||||
size_gb: 4.13
|
||||
priority: high
|
||||
description: "Specialized model for mathematical tasks"
|
||||
|
||||
storage:
|
||||
models_dir: /app/models
|
||||
cache_dir: /app/cache
|
||||
swap_dir: /app/swap
|
||||
|
||||
ollama:
|
||||
url: http://ollama:11434 # From Docker container to Ollama service
|
||||
timeout: 300
|
||||
|
||||
90
services/swapper-service/config/swapper_config_node2.yaml
Normal file
90
services/swapper-service/config/swapper_config_node2.yaml
Normal file
@@ -0,0 +1,90 @@
|
||||
# Swapper Configuration for Node #2 (Development Node)
|
||||
# Single-active LLM scheduler
|
||||
# MacBook Pro M4 Max - Apple Silicon (40-core GPU, 64GB RAM)
|
||||
# Auto-generated configuration with available Ollama models
|
||||
|
||||
swapper:
|
||||
mode: single-active
|
||||
max_concurrent_models: 1
|
||||
model_swap_timeout: 300
|
||||
gpu_enabled: true
|
||||
metal_acceleration: true # Apple Silicon GPU acceleration
|
||||
# Модель для автоматичного завантаження при старті (опціонально)
|
||||
# Якщо не вказано - моделі завантажуються тільки за запитом
|
||||
# Рекомендовано: gpt-oss:latest (швидка модель) або phi3:latest (легка модель)
|
||||
default_model: gpt-oss:latest # Модель активується автоматично при старті
|
||||
|
||||
models:
|
||||
# Fast LLM - GPT-OSS 20B (High Priority) - Main model for general tasks
|
||||
gpt-oss-latest:
|
||||
path: ollama:gpt-oss:latest
|
||||
type: llm
|
||||
size_gb: 13.0
|
||||
priority: high
|
||||
description: "Fast LLM for general tasks and conversations (20.9B params)"
|
||||
|
||||
# Lightweight LLM - Phi3 3.8B (High Priority) - Fast responses
|
||||
phi3-latest:
|
||||
path: ollama:phi3:latest
|
||||
type: llm
|
||||
size_gb: 2.2
|
||||
priority: high
|
||||
description: "Lightweight LLM for fast responses (3.8B params)"
|
||||
|
||||
# Code Specialist - StarCoder2 3B (Medium Priority) - Code engineering
|
||||
starcoder2-3b:
|
||||
path: ollama:starcoder2:3b
|
||||
type: code
|
||||
size_gb: 1.7
|
||||
priority: medium
|
||||
description: "Code specialist model for code engineering (3B params)"
|
||||
|
||||
# Reasoning Model - Mistral Nemo 12.2B (High Priority) - Advanced reasoning
|
||||
mistral-nemo-12b:
|
||||
path: ollama:mistral-nemo:12b
|
||||
type: llm
|
||||
size_gb: 7.1
|
||||
priority: high
|
||||
description: "Advanced reasoning model for complex tasks (12.2B params)"
|
||||
|
||||
# Reasoning Model - Gemma2 27B (Medium Priority) - Strategic reasoning
|
||||
gemma2-27b:
|
||||
path: ollama:gemma2:27b
|
||||
type: llm
|
||||
size_gb: 15.0
|
||||
priority: medium
|
||||
description: "Reasoning model for strategic tasks (27.2B params)"
|
||||
|
||||
# Code Specialist - DeepSeek Coder 33B (High Priority) - Advanced code tasks
|
||||
deepseek-coder-33b:
|
||||
path: ollama:deepseek-coder:33b
|
||||
type: code
|
||||
size_gb: 18.0
|
||||
priority: high
|
||||
description: "Advanced code specialist model (33B params)"
|
||||
|
||||
# Code Specialist - Qwen2.5 Coder 32B (High Priority) - Advanced code tasks
|
||||
qwen2.5-coder-32b:
|
||||
path: ollama:qwen2.5-coder:32b
|
||||
type: code
|
||||
size_gb: 19.0
|
||||
priority: high
|
||||
description: "Advanced code specialist model (32.8B params)"
|
||||
|
||||
# Reasoning Model - DeepSeek R1 70B (High Priority) - Strategic reasoning (large model)
|
||||
deepseek-r1-70b:
|
||||
path: ollama:deepseek-r1:70b
|
||||
type: llm
|
||||
size_gb: 42.0
|
||||
priority: high
|
||||
description: "Strategic reasoning model (70.6B params, quantized)"
|
||||
|
||||
storage:
|
||||
models_dir: /app/models
|
||||
cache_dir: /app/cache
|
||||
swap_dir: /app/swap
|
||||
|
||||
ollama:
|
||||
url: http://localhost:11434 # Native Ollama on MacBook (via Pieces OS or brew)
|
||||
timeout: 300
|
||||
|
||||
7
services/swapper-service/requirements.txt
Normal file
7
services/swapper-service/requirements.txt
Normal file
@@ -0,0 +1,7 @@
|
||||
fastapi==0.104.1
|
||||
uvicorn[standard]==0.24.0
|
||||
httpx==0.25.2
|
||||
pydantic==2.5.0
|
||||
pyyaml==6.0.1
|
||||
python-multipart==0.0.6
|
||||
|
||||
38
services/swapper-service/start.sh
Executable file
38
services/swapper-service/start.sh
Executable file
@@ -0,0 +1,38 @@
|
||||
#!/bin/bash
|
||||
# Start Swapper Service locally
|
||||
|
||||
set -e
|
||||
|
||||
echo "🚀 Starting Swapper Service..."
|
||||
|
||||
# Check if virtual environment exists
|
||||
if [ ! -d "venv" ]; then
|
||||
echo "📦 Creating virtual environment..."
|
||||
python3 -m venv venv
|
||||
fi
|
||||
|
||||
# Activate virtual environment
|
||||
source venv/bin/activate
|
||||
|
||||
# Install dependencies
|
||||
echo "📥 Installing dependencies..."
|
||||
pip install -q --upgrade pip
|
||||
pip install -q -r requirements.txt
|
||||
|
||||
# Set environment variables
|
||||
export OLLAMA_BASE_URL=${OLLAMA_BASE_URL:-http://localhost:11434}
|
||||
export SWAPPER_CONFIG_PATH=${SWAPPER_CONFIG_PATH:-./config/swapper_config.yaml}
|
||||
export SWAPPER_MODE=${SWAPPER_MODE:-single-active}
|
||||
export MAX_CONCURRENT_MODELS=${MAX_CONCURRENT_MODELS:-1}
|
||||
export MODEL_SWAP_TIMEOUT=${MODEL_SWAP_TIMEOUT:-30}
|
||||
|
||||
# Start service
|
||||
echo "✅ Starting Swapper Service on port 8890..."
|
||||
echo " Health: http://localhost:8890/health"
|
||||
echo " Status: http://localhost:8890/status"
|
||||
echo " Cabinet API: http://localhost:8890/api/cabinet/swapper/status"
|
||||
echo ""
|
||||
echo "Press Ctrl+C to stop"
|
||||
|
||||
python3 -m uvicorn app.main:app --host 0.0.0.0 --port 8890
|
||||
|
||||
Reference in New Issue
Block a user