feat: Add presence heartbeat for Matrix online status

- matrix-gateway: POST /internal/matrix/presence/online endpoint - usePresenceHeartbeat hook with activity tracking - Auto away after 5 min inactivity - Offline on page close/visibility change - Integrated in MatrixChatRoom component
2025-11-27 00:19:40 -08:00
parent 5bed515852
commit 3de3c8cb36
6371 changed files with 1317450 additions and 932 deletions
--- a/services/swapper-service/Dockerfile
+++ b/services/swapper-service/Dockerfile
@@ -0,0 +1,13 @@
+FROM python:3.11-slim
+
+WORKDIR /app
+
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+
+COPY app/ ./app/
+
+EXPOSE 8890
+
+CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8890"]
+
--- a/services/swapper-service/README.md
+++ b/services/swapper-service/README.md
@@ -0,0 +1,353 @@
+# Swapper Service
+
+**Version:** 1.0.0  
+**Status:** ✅ Ready for Node #2  
+**Port:** 8890
+
+Dynamic model loading service that manages LLM models on-demand to optimize memory usage. Supports single-active mode (one model loaded at a time).
+
+---
+
+## Overview
+
+Swapper Service provides:
+- **Dynamic Model Loading** — Load/unload models on-demand
+- **Single-Active Mode** — Only one model loaded at a time (memory optimization)
+- **Model Metrics** — Track uptime, request count, load/unload times
+- **Ollama Integration** — Works with Ollama models
+- **REST API** — Full API for model management
+
+---
+
+## Features
+
+### Model Management
+- Load models on-demand
+- Unload models to free memory
+- Track which model is currently active
+- Monitor model uptime and usage
+
+### Metrics
+- Current active model
+- Model uptime (hours)
+- Request count per model
+- Load/unload timestamps
+- Total uptime per model
+
+### Single-Active Mode
+- Only one model loaded at a time
+- Automatic unloading of previous model when loading new one
+- Optimizes memory usage on resource-constrained systems
+
+---
+
+## Quick Start
+
+### Docker (Recommended)
+
+```bash
+# Build and start
+docker-compose up -d swapper-service
+
+# Check health
+curl http://localhost:8890/health
+
+# Get status
+curl http://localhost:8890/status
+
+# List models
+curl http://localhost:8890/models
+```
+
+### Local Development
+
+```bash
+cd services/swapper-service
+
+# Install dependencies
+pip install -r requirements.txt
+
+# Set environment variables
+export OLLAMA_BASE_URL=http://localhost:11434
+export SWAPPER_CONFIG_PATH=./config/swapper_config.yaml
+
+# Run service
+python -m app.main
+```
+
+---
+
+## API Endpoints
+
+### Health & Status
+
+#### GET /health
+Health check endpoint
+
+**Response:**
+```json
+{
+  "status": "healthy",
+  "service": "swapper-service",
+  "active_model": "deepseek-r1-70b",
+  "mode": "single-active"
+}
+```
+
+#### GET /status
+Get full Swapper service status
+
+**Response:**
+```json
+{
+  "status": "healthy",
+  "active_model": "deepseek-r1-70b",
+  "available_models": ["deepseek-r1-70b", "qwen2.5-coder-32b", ...],
+  "loaded_models": ["deepseek-r1-70b"],
+  "mode": "single-active",
+  "total_models": 8
+}
+```
+
+### Model Management
+
+#### GET /models
+List all available models
+
+**Response:**
+```json
+{
+  "models": [
+    {
+      "name": "deepseek-r1-70b",
+      "ollama_name": "deepseek-r1:70b",
+      "type": "llm",
+      "size_gb": 42,
+      "priority": "high",
+      "status": "loaded"
+    }
+  ]
+}
+```
+
+#### GET /models/{model_name}
+Get information about a specific model
+
+**Response:**
+```json
+{
+  "name": "deepseek-r1-70b",
+  "ollama_name": "deepseek-r1:70b",
+  "type": "llm",
+  "size_gb": 42,
+  "priority": "high",
+  "status": "loaded",
+  "loaded_at": "2025-11-22T10:30:00",
+  "unloaded_at": null,
+  "total_uptime_seconds": 3600.5
+}
+```
+
+#### POST /models/{model_name}/load
+Load a model
+
+**Response:**
+```json
+{
+  "status": "success",
+  "model": "deepseek-r1-70b",
+  "message": "Model deepseek-r1-70b loaded"
+}
+```
+
+#### POST /models/{model_name}/unload
+Unload a model
+
+**Response:**
+```json
+{
+  "status": "success",
+  "model": "deepseek-r1-70b",
+  "message": "Model deepseek-r1-70b unloaded"
+}
+```
+
+### Metrics
+
+#### GET /metrics
+Get metrics for all models
+
+**Response:**
+```json
+{
+  "metrics": [
+    {
+      "model_name": "deepseek-r1-70b",
+      "status": "loaded",
+      "loaded_at": "2025-11-22T10:30:00",
+      "uptime_hours": 1.5,
+      "request_count": 42,
+      "total_uptime_seconds": 5400.0
+    }
+  ]
+}
+```
+
+#### GET /metrics/{model_name}
+Get metrics for a specific model
+
+**Response:**
+```json
+{
+  "model_name": "deepseek-r1-70b",
+  "status": "loaded",
+  "loaded_at": "2025-11-22T10:30:00",
+  "uptime_hours": 1.5,
+  "request_count": 42,
+  "total_uptime_seconds": 5400.0
+}
+```
+
+---
+
+## Configuration
+
+### Environment Variables
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `OLLAMA_BASE_URL` | `http://localhost:11434` | Ollama API URL |
+| `SWAPPER_CONFIG_PATH` | `./config/swapper_config.yaml` | Path to config file |
+| `SWAPPER_MODE` | `single-active` | Mode: `single-active` or `multi-active` |
+| `MAX_CONCURRENT_MODELS` | `1` | Max concurrent models (for multi-active mode) |
+| `MODEL_SWAP_TIMEOUT` | `30` | Timeout for model swap (seconds) |
+
+### Config File (swapper_config.yaml)
+
+```yaml
+swapper:
+  mode: single-active
+  max_concurrent_models: 1
+  model_swap_timeout: 30
+  gpu_enabled: true
+  metal_acceleration: true
+
+models:
+  deepseek-r1-70b:
+    path: ollama:deepseek-r1:70b
+    type: llm
+    size_gb: 42
+    priority: high
+```
+
+---
+
+## Integration with Router
+
+Swapper Service integrates with DAGI Router through metadata:
+
+```python
+router_request = {
+    "message": "Your request",
+    "mode": "chat",
+    "metadata": {
+        "use_llm": "specialist_vision_8b",  # Swapper will load this model
+        "swapper_service": "http://swapper-service:8890"
+    }
+}
+```
+
+---
+
+## Monitoring
+
+### Health Check
+```bash
+curl http://localhost:8890/health
+```
+
+### Prometheus Metrics (Future)
+- `swapper_active_model` — Currently active model
+- `swapper_model_uptime_seconds` — Uptime per model
+- `swapper_model_requests_total` — Total requests per model
+
+---
+
+## Troubleshooting
+
+### Model won't load
+```bash
+# Check Ollama is running
+curl http://localhost:11434/api/tags
+
+# Check model exists in Ollama
+curl http://localhost:11434/api/tags | grep "model_name"
+
+# Check Swapper logs
+docker logs swapper-service
+```
+
+### Service not responding
+```bash
+# Check if service is running
+docker ps | grep swapper-service
+
+# Check health
+curl http://localhost:8890/health
+
+# Check logs
+docker logs -f swapper-service
+```
+
+---
+
+## Differences: Swapper Service vs vLLM
+
+**Swapper Service:**
+- Model loading/unloading manager
+- Single-active mode (one model at a time)
+- Memory optimization
+- Works with Ollama
+- Lightweight, simple API
+
+**vLLM:**
+- High-performance inference engine
+- Continuous serving (models stay loaded)
+- Optimized for throughput
+- Direct GPU acceleration
+- More complex, production-grade
+
+**Use Swapper when:**
+- Memory is limited
+- Need to switch between models frequently
+- Running on resource-constrained systems (like Node #2 MacBook)
+
+**Use vLLM when:**
+- Need maximum throughput
+- Models stay loaded for long periods
+- Have dedicated GPU resources
+- Production serving at scale
+
+---
+
+## Next Steps
+
+1. **Add to Node #2 Admin Console**
+   - Display active model
+   - Show model metrics (uptime, requests)
+   - Allow manual model loading/unloading
+
+2. **Integration with Router**
+   - Auto-load models based on request type
+   - Route requests to appropriate models
+
+3. **Metrics Dashboard**
+   - Grafana dashboard for Swapper metrics
+   - Model usage analytics
+
+---
+
+**Last Updated:** 2025-11-22  
+**Maintained by:** Ivan Tytar & DAARION Team  
+**Status:** ✅ Ready for Node #2
+
--- a/services/swapper-service/app/init.py
+++ b/services/swapper-service/app/init.py
@@ -0,0 +1,2 @@
+# Swapper Service App Package
+
--- a/services/swapper-service/app/cabinet_api.py
+++ b/services/swapper-service/app/cabinet_api.py
@@ -0,0 +1,168 @@
+"""
+Cabinet API endpoints for Swapper Service
+Provides data for Node #1 and Node #2 admin consoles
+"""
+
+from fastapi import APIRouter, HTTPException
+from typing import Dict, Any, List
+from datetime import datetime
+
+# Import will be done after swapper is initialized
+
+router = APIRouter(prefix="/api/cabinet", tags=["cabinet"])
+
+def get_swapper():
+    """Get swapper instance (lazy import to avoid circular dependency)"""
+    from app.main import swapper
+    return swapper
+
+@router.get("/swapper/status")
+async def get_swapper_status_for_cabinet() -> Dict[str, Any]:
+    """
+    Get Swapper Service status for admin console display
+    Returns data formatted for Node #1 and Node #2 cabinets
+    """
+    try:
+        swapper = get_swapper()
+        status = await swapper.get_status()
+        metrics = await swapper.get_model_metrics()
+        
+        # Format active model info
+        active_model_info = None
+        if status.active_model:
+            active_metrics = next(
+                (m for m in metrics if m.model_name == status.active_model),
+                None
+            )
+            if active_metrics:
+                active_model_info = {
+                    "name": status.active_model,
+                    "uptime_hours": round(active_metrics.uptime_hours, 2),
+                    "request_count": active_metrics.request_count,
+                    "loaded_at": active_metrics.loaded_at.isoformat() if active_metrics.loaded_at else None
+                }
+        
+        # Format all models with their status
+        swapper = get_swapper()
+        models_info = []
+        for model_name in status.available_models:
+            model_metrics = next(
+                (m for m in metrics if m.model_name == model_name),
+                None
+            )
+            model_data = swapper.models.get(model_name)
+            
+            if model_data:
+                models_info.append({
+                    "name": model_name,
+                    "ollama_name": model_data.ollama_name,
+                    "type": model_data.type,
+                    "size_gb": model_data.size_gb,
+                    "priority": model_data.priority,
+                    "status": model_data.status.value,
+                    "is_active": model_name == status.active_model,
+                    "uptime_hours": round(model_metrics.uptime_hours, 2) if model_metrics else 0.0,
+                    "request_count": model_metrics.request_count if model_metrics else 0,
+                    "total_uptime_seconds": model_metrics.total_uptime_seconds if model_metrics else 0.0
+                })
+        
+        return {
+            "service": "swapper-service",
+            "status": status.status,
+            "mode": status.mode,
+            "active_model": active_model_info,
+            "total_models": status.total_models,
+            "available_models": status.available_models,
+            "loaded_models": status.loaded_models,
+            "models": models_info,
+            "timestamp": datetime.now().isoformat()
+        }
+    except Exception as e:
+        raise HTTPException(status_code=500, detail=f"Error getting Swapper status: {str(e)}")
+
+@router.get("/swapper/models")
+async def get_swapper_models_for_cabinet() -> Dict[str, Any]:
+    """
+    Get all models with detailed information for cabinet display
+    """
+    try:
+        swapper = get_swapper()
+        status = await swapper.get_status()
+        metrics = await swapper.get_model_metrics()
+        
+        models_detail = []
+        for model_name in status.available_models:
+            model_data = swapper.models.get(model_name)
+            model_metrics = next(
+                (m for m in metrics if m.model_name == model_name),
+                None
+            )
+            
+            if model_data:
+                models_detail.append({
+                    "name": model_name,
+                    "ollama_name": model_data.ollama_name,
+                    "type": model_data.type,
+                    "size_gb": model_data.size_gb,
+                    "priority": model_data.priority,
+                    "status": model_data.status.value,
+                    "is_active": model_name == status.active_model,
+                    "can_load": model_data.status.value in ["unloaded", "error"],
+                    "can_unload": model_data.status.value == "loaded",
+                    "uptime_hours": round(model_metrics.uptime_hours, 2) if model_metrics else 0.0,
+                    "request_count": model_metrics.request_count if model_metrics else 0,
+                    "total_uptime_seconds": model_metrics.total_uptime_seconds if model_metrics else 0.0,
+                    "loaded_at": model_metrics.loaded_at.isoformat() if model_metrics and model_metrics.loaded_at else None
+                })
+        
+        return {
+            "models": models_detail,
+            "total": len(models_detail),
+            "active_count": len(status.loaded_models),
+            "timestamp": datetime.now().isoformat()
+        }
+    except Exception as e:
+        raise HTTPException(status_code=500, detail=f"Error getting models: {str(e)}")
+
+@router.get("/swapper/metrics/summary")
+async def get_swapper_metrics_summary() -> Dict[str, Any]:
+    """
+    Get summary metrics for cabinet dashboard
+    """
+    try:
+        swapper = get_swapper()
+        status = await swapper.get_status()
+        metrics = await swapper.get_model_metrics()
+        
+        # Calculate totals
+        total_uptime_hours = sum(m.uptime_hours for m in metrics)
+        total_requests = sum(m.request_count for m in metrics)
+        
+        # Most used model
+        most_used = max(metrics, key=lambda m: m.total_uptime_seconds) if metrics else None
+        
+        return {
+            "summary": {
+                "total_models": status.total_models,
+                "active_models": len(status.loaded_models),
+                "available_models": len(status.available_models),
+                "total_uptime_hours": round(total_uptime_hours, 2),
+                "total_requests": total_requests
+            },
+            "most_used_model": {
+                "name": most_used.model_name,
+                "uptime_hours": round(most_used.uptime_hours, 2),
+                "request_count": most_used.request_count
+            } if most_used else None,
+            "active_model": {
+                "name": status.active_model,
+                "uptime_hours": round(
+                    next((m.uptime_hours for m in metrics if m.model_name == status.active_model), 0.0),
+                    2
+                ) if status.active_model else None
+            } if status.active_model else None,
+            "timestamp": datetime.now().isoformat()
+        }
+    except Exception as e:
+        raise HTTPException(status_code=500, detail=f"Error getting metrics summary: {str(e)}")
+
--- a/services/swapper-service/app/main.py
+++ b/services/swapper-service/app/main.py
@@ -0,0 +1,437 @@
+"""
+Swapper Service - Dynamic Model Loading Service
+Manages loading/unloading LLM models on-demand to optimize memory usage.
+Supports single-active model mode (one model loaded at a time).
+"""
+
+import os
+import asyncio
+import logging
+from typing import Optional, Dict, List, Any
+from datetime import datetime, timedelta
+from enum import Enum
+
+from fastapi import FastAPI, HTTPException, BackgroundTasks
+from fastapi.middleware.cors import CORSMiddleware
+from pydantic import BaseModel
+import httpx
+import yaml
+
+logger = logging.getLogger(__name__)
+
+# ========== Configuration ==========
+
+OLLAMA_BASE_URL = os.getenv("OLLAMA_BASE_URL", "http://localhost:11434")
+SWAPPER_CONFIG_PATH = os.getenv("SWAPPER_CONFIG_PATH", "./config/swapper_config.yaml")
+SWAPPER_MODE = os.getenv("SWAPPER_MODE", "single-active")  # single-active or multi-active
+MAX_CONCURRENT_MODELS = int(os.getenv("MAX_CONCURRENT_MODELS", "1"))
+MODEL_SWAP_TIMEOUT = int(os.getenv("MODEL_SWAP_TIMEOUT", "30"))
+
+# ========== Models ==========
+
+class ModelStatus(str, Enum):
+    """Model status"""
+    LOADED = "loaded"
+    LOADING = "loading"
+    UNLOADED = "unloaded"
+    UNLOADING = "unloading"
+    ERROR = "error"
+
+class ModelInfo(BaseModel):
+    """Model information"""
+    name: str
+    ollama_name: str
+    type: str  # llm, code, vision, math
+    size_gb: float
+    priority: str  # high, medium, low
+    status: ModelStatus
+    loaded_at: Optional[datetime] = None
+    unloaded_at: Optional[datetime] = None
+    total_uptime_seconds: float = 0.0
+    request_count: int = 0
+
+class SwapperStatus(BaseModel):
+    """Swapper service status"""
+    status: str
+    active_model: Optional[str] = None
+    available_models: List[str]
+    loaded_models: List[str]
+    mode: str
+    total_models: int
+
+class ModelMetrics(BaseModel):
+    """Model usage metrics"""
+    model_name: str
+    status: str
+    loaded_at: Optional[datetime] = None
+    uptime_hours: float
+    request_count: int
+    total_uptime_seconds: float
+
+# ========== Swapper Service ==========
+
+class SwapperService:
+    """Swapper Service - manages model loading/unloading"""
+    
+    def __init__(self):
+        self.models: Dict[str, ModelInfo] = {}
+        self.active_model: Optional[str] = None
+        self.loading_lock = asyncio.Lock()
+        self.http_client = httpx.AsyncClient(timeout=300.0)
+        self.model_uptime: Dict[str, float] = {}  # Track uptime per model
+        self.model_load_times: Dict[str, datetime] = {}  # Track when model was loaded
+        
+    async def initialize(self):
+        """Initialize Swapper Service - load configuration"""
+        config = None
+        try:
+            logger.info(f"🔧 Initializing Swapper Service...")
+            logger.info(f"🔧 Config path: {SWAPPER_CONFIG_PATH}")
+            logger.info(f"🔧 Config exists: {os.path.exists(SWAPPER_CONFIG_PATH)}")
+            
+            if os.path.exists(SWAPPER_CONFIG_PATH):
+                with open(SWAPPER_CONFIG_PATH, 'r') as f:
+                    config = yaml.safe_load(f)
+                    models_config = config.get('models', {})
+                    logger.info(f"🔧 Found {len(models_config)} models in config")
+                    
+                    for model_key, model_config in models_config.items():
+                        ollama_name = model_config.get('path', '').replace('ollama:', '')
+                        logger.info(f"🔧 Adding model: {model_key} -> {ollama_name}")
+                        self.models[model_key] = ModelInfo(
+                            name=model_key,
+                            ollama_name=ollama_name,
+                            type=model_config.get('type', 'llm'),
+                            size_gb=model_config.get('size_gb', 0),
+                            priority=model_config.get('priority', 'medium'),
+                            status=ModelStatus.UNLOADED
+                        )
+                        self.model_uptime[model_key] = 0.0
+                    logger.info(f"✅ Loaded {len(self.models)} models into Swapper")
+            else:
+                logger.warning(f"⚠️ Config file not found: {SWAPPER_CONFIG_PATH}, using defaults")
+                # Load default models from Ollama
+                await self._load_models_from_ollama()
+                
+            logger.info(f"✅ Swapper Service initialized with {len(self.models)} models")
+            logger.info(f"✅ Model names: {list(self.models.keys())}")
+            
+            # Завантажити модель за замовчанням, якщо вказано в конфігурації
+            if config:
+                swapper_config = config.get('swapper', {})
+                default_model = swapper_config.get('default_model')
+                
+                if default_model and default_model in self.models:
+                    logger.info(f"🔄 Loading default model: {default_model}")
+                    success = await self.load_model(default_model)
+                    if success:
+                        logger.info(f"✅ Default model loaded: {default_model}")
+                    else:
+                        logger.warning(f"⚠️ Failed to load default model: {default_model}")
+                elif default_model:
+                    logger.warning(f"⚠️ Default model '{default_model}' not found in models list")
+        except Exception as e:
+            logger.error(f"❌ Error initializing Swapper Service: {e}", exc_info=True)
+            import traceback
+            logger.error(f"❌ Traceback: {traceback.format_exc()}")
+    
+    async def _load_models_from_ollama(self):
+        """Load available models from Ollama"""
+        try:
+            response = await self.http_client.get(f"{OLLAMA_BASE_URL}/api/tags")
+            if response.status_code == 200:
+                data = response.json()
+                for model in data.get('models', []):
+                    model_name = model.get('name', '')
+                    # Extract base name (remove :latest, :7b, etc.)
+                    base_name = model_name.split(':')[0]
+                    
+                    if base_name not in self.models:
+                        size_gb = model.get('size', 0) / (1024**3)  # Convert bytes to GB
+                        self.models[base_name] = ModelInfo(
+                            name=base_name,
+                            ollama_name=model_name,
+                            type='llm',  # Default type
+                            size_gb=size_gb,
+                            priority='medium',
+                            status=ModelStatus.UNLOADED
+                        )
+                        self.model_uptime[base_name] = 0.0
+                        
+                logger.info(f"✅ Loaded {len(self.models)} models from Ollama")
+        except Exception as e:
+            logger.error(f"❌ Error loading models from Ollama: {e}")
+    
+    async def load_model(self, model_name: str) -> bool:
+        """Load a model (unload current if in single-active mode)"""
+        async with self.loading_lock:
+            try:
+                # Check if model exists
+                if model_name not in self.models:
+                    logger.error(f"❌ Model not found: {model_name}")
+                    return False
+                
+                model_info = self.models[model_name]
+                
+                # If single-active mode and another model is loaded, unload it first
+                if SWAPPER_MODE == "single-active" and self.active_model and self.active_model != model_name:
+                    await self._unload_model_internal(self.active_model)
+                
+                # Load the model
+                logger.info(f"🔄 Loading model: {model_name}")
+                model_info.status = ModelStatus.LOADING
+                
+                # Check if model is already loaded in Ollama
+                response = await self.http_client.post(
+                    f"{OLLAMA_BASE_URL}/api/generate",
+                    json={
+                        "model": model_info.ollama_name,
+                        "prompt": "test",
+                        "stream": False
+                    },
+                    timeout=MODEL_SWAP_TIMEOUT
+                )
+                
+                if response.status_code == 200:
+                    model_info.status = ModelStatus.LOADED
+                    model_info.loaded_at = datetime.now()
+                    model_info.unloaded_at = None
+                    self.active_model = model_name
+                    self.model_load_times[model_name] = datetime.now()
+                    logger.info(f"✅ Model loaded: {model_name}")
+                    return True
+                else:
+                    model_info.status = ModelStatus.ERROR
+                    logger.error(f"❌ Failed to load model: {model_name}")
+                    return False
+                    
+            except Exception as e:
+                logger.error(f"❌ Error loading model {model_name}: {e}", exc_info=True)
+                if model_name in self.models:
+                    self.models[model_name].status = ModelStatus.ERROR
+                return False
+    
+    async def _unload_model_internal(self, model_name: str) -> bool:
+        """Internal method to unload a model"""
+        try:
+            if model_name not in self.models:
+                return False
+            
+            model_info = self.models[model_name]
+            
+            if model_info.status == ModelStatus.LOADED:
+                logger.info(f"🔄 Unloading model: {model_name}")
+                model_info.status = ModelStatus.UNLOADING
+                
+                # Calculate uptime
+                if model_name in self.model_load_times:
+                    load_time = self.model_load_times[model_name]
+                    uptime_seconds = (datetime.now() - load_time).total_seconds()
+                    self.model_uptime[model_name] = self.model_uptime.get(model_name, 0.0) + uptime_seconds
+                    model_info.total_uptime_seconds = self.model_uptime[model_name]
+                    del self.model_load_times[model_name]
+                
+                model_info.status = ModelStatus.UNLOADED
+                model_info.unloaded_at = datetime.now()
+                
+                if self.active_model == model_name:
+                    self.active_model = None
+                
+                logger.info(f"✅ Model unloaded: {model_name}")
+                return True
+                
+        except Exception as e:
+            logger.error(f"❌ Error unloading model {model_name}: {e}")
+            return False
+    
+    async def unload_model(self, model_name: str) -> bool:
+        """Unload a model"""
+        async with self.loading_lock:
+            return await self._unload_model_internal(model_name)
+    
+    async def get_status(self) -> SwapperStatus:
+        """Get Swapper service status"""
+        # Update uptime for currently loaded model
+        if self.active_model and self.active_model in self.model_load_times:
+            load_time = self.model_load_times[self.active_model]
+            current_uptime = (datetime.now() - load_time).total_seconds()
+            self.model_uptime[self.active_model] = self.model_uptime.get(self.active_model, 0.0) + current_uptime
+            self.model_load_times[self.active_model] = datetime.now()  # Reset timer
+        
+        loaded_models = [
+            name for name, model in self.models.items()
+            if model.status == ModelStatus.LOADED
+        ]
+        
+        return SwapperStatus(
+            status="healthy",
+            active_model=self.active_model,
+            available_models=list(self.models.keys()),
+            loaded_models=loaded_models,
+            mode=SWAPPER_MODE,
+            total_models=len(self.models)
+        )
+    
+    async def get_model_metrics(self, model_name: Optional[str] = None) -> List[ModelMetrics]:
+        """Get metrics for model(s)"""
+        metrics = []
+        
+        models_to_check = [model_name] if model_name else list(self.models.keys())
+        
+        for name in models_to_check:
+            if name not in self.models:
+                continue
+            
+            model_info = self.models[name]
+            
+            # Calculate current uptime
+            uptime_seconds = self.model_uptime.get(name, 0.0)
+            if name in self.model_load_times:
+                load_time = self.model_load_times[name]
+                current_uptime = (datetime.now() - load_time).total_seconds()
+                uptime_seconds += current_uptime
+            
+            uptime_hours = uptime_seconds / 3600.0
+            
+            metrics.append(ModelMetrics(
+                model_name=name,
+                status=model_info.status.value,
+                loaded_at=model_info.loaded_at,
+                uptime_hours=uptime_hours,
+                request_count=model_info.request_count,
+                total_uptime_seconds=uptime_seconds
+            ))
+        
+        return metrics
+    
+    async def close(self):
+        """Close HTTP client"""
+        await self.http_client.aclose()
+
+# ========== FastAPI App ==========
+
+app = FastAPI(
+    title="Swapper Service",
+    description="Dynamic model loading service for Node #2",
+    version="1.0.0"
+)
+
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+
+# Include cabinet API router (import after swapper is created)
+try:
+    from app.cabinet_api import router as cabinet_router
+    app.include_router(cabinet_router)
+    logger.info("✅ Cabinet API router included")
+except ImportError:
+    logger.warning("⚠️ cabinet_api module not found, skipping cabinet router")
+
+# Global Swapper instance
+swapper = SwapperService()
+
+@app.on_event("startup")
+async def startup():
+    """Initialize Swapper on startup"""
+    await swapper.initialize()
+
+@app.on_event("shutdown")
+async def shutdown():
+    """Close Swapper on shutdown"""
+    await swapper.close()
+
+# ========== API Endpoints ==========
+
+@app.get("/health")
+async def health():
+    """Health check endpoint"""
+    status = await swapper.get_status()
+    return {
+        "status": "healthy",
+        "service": "swapper-service",
+        "active_model": status.active_model,
+        "mode": status.mode
+    }
+
+@app.get("/status", response_model=SwapperStatus)
+async def get_status():
+    """Get Swapper service status"""
+    return await swapper.get_status()
+
+@app.get("/models")
+async def list_models():
+    """List all available models"""
+    return {
+        "models": [
+            {
+                "name": model.name,
+                "ollama_name": model.ollama_name,
+                "type": model.type,
+                "size_gb": model.size_gb,
+                "priority": model.priority,
+                "status": model.status.value
+            }
+            for model in swapper.models.values()
+        ]
+    }
+
+@app.get("/models/{model_name}")
+async def get_model_info(model_name: str):
+    """Get information about a specific model"""
+    if model_name not in swapper.models:
+        raise HTTPException(status_code=404, detail=f"Model not found: {model_name}")
+    
+    model_info = swapper.models[model_name]
+    return {
+        "name": model_info.name,
+        "ollama_name": model_info.ollama_name,
+        "type": model_info.type,
+        "size_gb": model_info.size_gb,
+        "priority": model_info.priority,
+        "status": model_info.status.value,
+        "loaded_at": model_info.loaded_at.isoformat() if model_info.loaded_at else None,
+        "unloaded_at": model_info.unloaded_at.isoformat() if model_info.unloaded_at else None,
+        "total_uptime_seconds": swapper.model_uptime.get(model_name, 0.0)
+    }
+
+@app.post("/models/{model_name}/load")
+async def load_model_endpoint(model_name: str):
+    """Load a model"""
+    success = await swapper.load_model(model_name)
+    if success:
+        return {"status": "success", "model": model_name, "message": f"Model {model_name} loaded"}
+    raise HTTPException(status_code=500, detail=f"Failed to load model: {model_name}")
+
+@app.post("/models/{model_name}/unload")
+async def unload_model_endpoint(model_name: str):
+    """Unload a model"""
+    success = await swapper.unload_model(model_name)
+    if success:
+        return {"status": "success", "model": model_name, "message": f"Model {model_name} unloaded"}
+    raise HTTPException(status_code=500, detail=f"Failed to unload model: {model_name}")
+
+@app.get("/metrics")
+async def get_metrics(model_name: Optional[str] = None):
+    """Get metrics for model(s)"""
+    metrics = await swapper.get_model_metrics(model_name)
+    return {
+        "metrics": [metric.dict() for metric in metrics]
+    }
+
+@app.get("/metrics/{model_name}")
+async def get_model_metrics(model_name: str):
+    """Get metrics for a specific model"""
+    metrics = await swapper.get_model_metrics(model_name)
+    if not metrics:
+        raise HTTPException(status_code=404, detail=f"Model not found: {model_name}")
+    return metrics[0].dict()
+
+if __name__ == "__main__":
+    import uvicorn
+    uvicorn.run(app, host="0.0.0.0", port=8890)
+
--- a/services/swapper-service/cabinet-integration.css
+++ b/services/swapper-service/cabinet-integration.css
@@ -0,0 +1,393 @@
+/* Swapper Service Cabinet Integration Styles */
+
+.swapper-status-card {
+  background: white;
+  border-radius: 8px;
+  padding: 24px;
+  box-shadow: 0 2px 8px rgba(0, 0, 0, 0.1);
+  margin-bottom: 24px;
+}
+
+.swapper-header {
+  display: flex;
+  justify-content: space-between;
+  align-items: center;
+  margin-bottom: 20px;
+  padding-bottom: 16px;
+  border-bottom: 2px solid #f0f0f0;
+}
+
+.swapper-header h3 {
+  margin: 0;
+  font-size: 24px;
+  font-weight: 600;
+  color: #1a1a1a;
+}
+
+.status-badge {
+  padding: 6px 12px;
+  border-radius: 6px;
+  font-size: 12px;
+  font-weight: 600;
+  text-transform: uppercase;
+}
+
+.status-healthy {
+  background: #4caf50;
+  color: white;
+}
+
+.status-degraded {
+  background: #ff9800;
+  color: white;
+}
+
+.status-unhealthy {
+  background: #f44336;
+  color: white;
+}
+
+.swapper-info {
+  display: grid;
+  grid-template-columns: repeat(3, 1fr);
+  gap: 16px;
+  margin-bottom: 24px;
+}
+
+.info-row {
+  display: flex;
+  flex-direction: column;
+  gap: 4px;
+}
+
+.info-row span:first-child {
+  font-size: 12px;
+  color: #666;
+  text-transform: uppercase;
+  letter-spacing: 0.5px;
+}
+
+.info-row span:last-child {
+  font-size: 18px;
+  font-weight: 600;
+  color: #1a1a1a;
+}
+
+.active-model-card {
+  background: linear-gradient(135deg, #e8f5e9 0%, #c8e6c9 100%);
+  border-radius: 8px;
+  padding: 20px;
+  margin-bottom: 24px;
+  border-left: 4px solid #4caf50;
+}
+
+.active-model-card h4 {
+  margin: 0 0 12px 0;
+  font-size: 16px;
+  color: #2e7d32;
+}
+
+.model-details {
+  display: flex;
+  flex-direction: column;
+  gap: 12px;
+}
+
+.model-name {
+  font-size: 20px;
+  font-weight: 600;
+  color: #1a1a1a;
+}
+
+.model-stats {
+  display: flex;
+  gap: 24px;
+  flex-wrap: wrap;
+}
+
+.stat {
+  display: flex;
+  flex-direction: column;
+  gap: 4px;
+}
+
+.stat-label {
+  font-size: 12px;
+  color: #666;
+  text-transform: uppercase;
+}
+
+.stat-value {
+  font-size: 16px;
+  font-weight: 600;
+  color: #1a1a1a;
+}
+
+.models-list {
+  margin-top: 24px;
+}
+
+.models-list h4 {
+  margin: 0 0 16px 0;
+  font-size: 18px;
+  color: #1a1a1a;
+}
+
+.models-table {
+  width: 100%;
+  border-collapse: collapse;
+  font-size: 14px;
+}
+
+.models-table thead {
+  background: #f5f5f5;
+  border-bottom: 2px solid #e0e0e0;
+}
+
+.models-table th {
+  padding: 12px;
+  text-align: left;
+  font-weight: 600;
+  color: #666;
+  text-transform: uppercase;
+  font-size: 11px;
+  letter-spacing: 0.5px;
+}
+
+.models-table td {
+  padding: 12px;
+  border-bottom: 1px solid #f0f0f0;
+}
+
+.models-table tr:hover {
+  background: #fafafa;
+}
+
+.models-table tr.active {
+  background: #fff3e0;
+  border-left: 3px solid #ff9800;
+}
+
+.model-type {
+  padding: 4px 8px;
+  border-radius: 4px;
+  font-size: 11px;
+  font-weight: 600;
+  text-transform: uppercase;
+}
+
+.type-llm {
+  background: #e3f2fd;
+  color: #1976d2;
+}
+
+.type-code {
+  background: #f3e5f5;
+  color: #7b1fa2;
+}
+
+.type-vision {
+  background: #e8f5e9;
+  color: #388e3c;
+}
+
+.type-math {
+  background: #fff3e0;
+  color: #f57c00;
+}
+
+.btn-load,
+.btn-unload {
+  padding: 6px 12px;
+  border: none;
+  border-radius: 4px;
+  font-size: 12px;
+  font-weight: 600;
+  cursor: pointer;
+  transition: all 0.2s;
+}
+
+.btn-load {
+  background: #4caf50;
+  color: white;
+}
+
+.btn-load:hover {
+  background: #45a049;
+}
+
+.btn-unload {
+  background: #f44336;
+  color: white;
+}
+
+.btn-unload:hover {
+  background: #da190b;
+}
+
+.active-indicator {
+  color: #4caf50;
+  font-weight: 600;
+  font-size: 12px;
+}
+
+.swapper-footer {
+  margin-top: 20px;
+  padding-top: 16px;
+  border-top: 1px solid #f0f0f0;
+  text-align: center;
+}
+
+.swapper-footer small {
+  color: #999;
+  font-size: 12px;
+}
+
+/* Metrics Summary */
+.swapper-metrics {
+  background: white;
+  border-radius: 8px;
+  padding: 24px;
+  box-shadow: 0 2px 8px rgba(0, 0, 0, 0.1);
+}
+
+.swapper-metrics h4 {
+  margin: 0 0 20px 0;
+  font-size: 18px;
+  color: #1a1a1a;
+}
+
+.metrics-grid {
+  display: grid;
+  grid-template-columns: repeat(2, 1fr);
+  gap: 16px;
+  margin-bottom: 24px;
+}
+
+.metric-card {
+  background: #f5f5f5;
+  border-radius: 6px;
+  padding: 16px;
+  text-align: center;
+}
+
+.metric-label {
+  font-size: 12px;
+  color: #666;
+  text-transform: uppercase;
+  margin-bottom: 8px;
+}
+
+.metric-value {
+  font-size: 24px;
+  font-weight: 600;
+  color: #1a1a1a;
+}
+
+.most-used-model {
+  background: #f5f5f5;
+  border-radius: 6px;
+  padding: 16px;
+  margin-top: 16px;
+}
+
+.most-used-model h5 {
+  margin: 0 0 12px 0;
+  font-size: 14px;
+  color: #666;
+  text-transform: uppercase;
+}
+
+.model-info {
+  display: flex;
+  justify-content: space-between;
+  align-items: center;
+}
+
+.model-name {
+  font-weight: 600;
+  color: #1a1a1a;
+}
+
+.model-uptime {
+  color: #666;
+  font-size: 14px;
+}
+
+/* Page Layout */
+.swapper-page {
+  max-width: 1400px;
+  margin: 0 auto;
+  padding: 24px;
+}
+
+.page-header {
+  margin-bottom: 32px;
+}
+
+.page-header h2 {
+  margin: 0 0 8px 0;
+  font-size: 32px;
+  color: #1a1a1a;
+}
+
+.page-header p {
+  margin: 0;
+  color: #666;
+  font-size: 16px;
+}
+
+.swapper-grid {
+  display: grid;
+  grid-template-columns: 2fr 1fr;
+  gap: 24px;
+}
+
+.swapper-main {
+  min-width: 0;
+}
+
+.swapper-sidebar {
+  min-width: 0;
+}
+
+/* Loading and Error States */
+.swapper-loading,
+.swapper-error {
+  padding: 24px;
+  text-align: center;
+  background: white;
+  border-radius: 8px;
+  box-shadow: 0 2px 8px rgba(0, 0, 0, 0.1);
+}
+
+.swapper-error {
+  color: #f44336;
+}
+
+/* Responsive */
+@media (max-width: 1024px) {
+  .swapper-grid {
+    grid-template-columns: 1fr;
+  }
+}
+
+@media (max-width: 768px) {
+  .swapper-info {
+    grid-template-columns: 1fr;
+  }
+
+  .metrics-grid {
+    grid-template-columns: 1fr;
+  }
+
+  .models-table {
+    font-size: 12px;
+  }
+
+  .models-table th,
+  .models-table td {
+    padding: 8px;
+  }
+}
+
--- a/services/swapper-service/cabinet-integration.tsx
+++ b/services/swapper-service/cabinet-integration.tsx
@@ -0,0 +1,311 @@
+/**
+ * Swapper Service Integration for Node #1 and Node #2 Admin Consoles
+ * React/TypeScript component example
+ */
+
+import React, { useEffect, useState } from 'react';
+
+// Types
+interface SwapperStatus {
+  service: string;
+  status: string;
+  mode: string;
+  active_model: {
+    name: string;
+    uptime_hours: number;
+    request_count: number;
+    loaded_at: string | null;
+  } | null;
+  total_models: number;
+  available_models: string[];
+  loaded_models: string[];
+  models: Array<{
+    name: string;
+    ollama_name: string;
+    type: string;
+    size_gb: number;
+    priority: string;
+    status: string;
+    is_active: boolean;
+    uptime_hours: number;
+    request_count: number;
+    total_uptime_seconds: number;
+  }>;
+  timestamp: string;
+}
+
+interface SwapperMetrics {
+  summary: {
+    total_models: number;
+    active_models: number;
+    available_models: number;
+    total_uptime_hours: number;
+    total_requests: number;
+  };
+  most_used_model: {
+    name: string;
+    uptime_hours: number;
+    request_count: number;
+  } | null;
+  active_model: {
+    name: string;
+    uptime_hours: number | null;
+  } | null;
+  timestamp: string;
+}
+
+// API Service
+const SWAPPER_API_BASE = process.env.NEXT_PUBLIC_SWAPPER_URL || 'http://localhost:8890';
+
+export const swapperService = {
+  async getStatus(): Promise<SwapperStatus> {
+    const response = await fetch(`${SWAPPER_API_BASE}/api/cabinet/swapper/status`);
+    if (!response.ok) throw new Error('Failed to fetch Swapper status');
+    return response.json();
+  },
+
+  async getMetrics(): Promise<SwapperMetrics> {
+    const response = await fetch(`${SWAPPER_API_BASE}/api/cabinet/swapper/metrics/summary`);
+    if (!response.ok) throw new Error('Failed to fetch Swapper metrics');
+    return response.json();
+  },
+
+  async loadModel(modelName: string): Promise<void> {
+    const response = await fetch(`${SWAPPER_API_BASE}/models/${modelName}/load`, {
+      method: 'POST',
+    });
+    if (!response.ok) throw new Error(`Failed to load model: ${modelName}`);
+  },
+
+  async unloadModel(modelName: string): Promise<void> {
+    const response = await fetch(`${SWAPPER_API_BASE}/models/${modelName}/unload`, {
+      method: 'POST',
+    });
+    if (!response.ok) throw new Error(`Failed to unload model: ${modelName}`);
+  },
+};
+
+// Main Swapper Status Component
+export const SwapperStatusCard: React.FC = () => {
+  const [status, setStatus] = useState<SwapperStatus | null>(null);
+  const [loading, setLoading] = useState(true);
+  const [error, setError] = useState<string | null>(null);
+
+  const fetchStatus = async () => {
+    try {
+      const data = await swapperService.getStatus();
+      setStatus(data);
+      setError(null);
+    } catch (err) {
+      setError(err instanceof Error ? err.message : 'Unknown error');
+    } finally {
+      setLoading(false);
+    }
+  };
+
+  useEffect(() => {
+    fetchStatus();
+    const interval = setInterval(fetchStatus, 30000); // Update every 30 seconds
+    return () => clearInterval(interval);
+  }, []);
+
+  if (loading) return <div className="swapper-loading">Loading Swapper status...</div>;
+  if (error) return <div className="swapper-error">Error: {error}</div>;
+  if (!status) return <div className="swapper-error">No status data</div>;
+
+  return (
+    <div className="swapper-status-card">
+      <div className="swapper-header">
+        <h3>🔄 Swapper Service</h3>
+        <span className={`status-badge status-${status.status}`}>
+          {status.status}
+        </span>
+      </div>
+
+      <div className="swapper-info">
+        <div className="info-row">
+          <span>Mode:</span>
+          <span>{status.mode}</span>
+        </div>
+        <div className="info-row">
+          <span>Total Models:</span>
+          <span>{status.total_models}</span>
+        </div>
+        <div className="info-row">
+          <span>Loaded Models:</span>
+          <span>{status.loaded_models.length}</span>
+        </div>
+      </div>
+
+      {status.active_model && (
+        <div className="active-model-card">
+          <h4>✨ Active Model</h4>
+          <div className="model-details">
+            <div className="model-name">{status.active_model.name}</div>
+            <div className="model-stats">
+              <div className="stat">
+                <span className="stat-label">Uptime:</span>
+                <span className="stat-value">{status.active_model.uptime_hours.toFixed(2)}h</span>
+              </div>
+              <div className="stat">
+                <span className="stat-label">Requests:</span>
+                <span className="stat-value">{status.active_model.request_count}</span>
+              </div>
+              {status.active_model.loaded_at && (
+                <div className="stat">
+                  <span className="stat-label">Loaded:</span>
+                  <span className="stat-value">
+                    {new Date(status.active_model.loaded_at).toLocaleString()}
+                  </span>
+                </div>
+              )}
+            </div>
+          </div>
+        </div>
+      )}
+
+      <div className="models-list">
+        <h4>Available Models</h4>
+        <table className="models-table">
+          <thead>
+            <tr>
+              <th>Name</th>
+              <th>Type</th>
+              <th>Size (GB)</th>
+              <th>Status</th>
+              <th>Uptime (h)</th>
+              <th>Actions</th>
+            </tr>
+          </thead>
+          <tbody>
+            {status.models.map((model) => (
+              <tr key={model.name} className={model.is_active ? 'active' : ''}>
+                <td>{model.name}</td>
+                <td>
+                  <span className={`model-type type-${model.type}`}>{model.type}</span>
+                </td>
+                <td>{model.size_gb.toFixed(1)}</td>
+                <td>
+                  <span className={`status-badge status-${model.status}`}>
+                    {model.status}
+                  </span>
+                </td>
+                <td>{model.uptime_hours.toFixed(2)}</td>
+                <td>
+                  {model.status === 'unloaded' && (
+                    <button
+                      className="btn-load"
+                      onClick={() => swapperService.loadModel(model.name).then(fetchStatus)}
+                    >
+                      Load
+                    </button>
+                  )}
+                  {model.status === 'loaded' && !model.is_active && (
+                    <button
+                      className="btn-unload"
+                      onClick={() => swapperService.unloadModel(model.name).then(fetchStatus)}
+                    >
+                      Unload
+                    </button>
+                  )}
+                  {model.is_active && (
+                    <span className="active-indicator">● Active</span>
+                  )}
+                </td>
+              </tr>
+            ))}
+          </tbody>
+        </table>
+      </div>
+
+      <div className="swapper-footer">
+        <small>Last updated: {new Date(status.timestamp).toLocaleString()}</small>
+      </div>
+    </div>
+  );
+};
+
+// Metrics Summary Component
+export const SwapperMetricsSummary: React.FC = () => {
+  const [metrics, setMetrics] = useState<SwapperMetrics | null>(null);
+  const [loading, setLoading] = useState(true);
+
+  useEffect(() => {
+    const fetchMetrics = async () => {
+      try {
+        const data = await swapperService.getMetrics();
+        setMetrics(data);
+      } catch (err) {
+        console.error('Error fetching metrics:', err);
+      } finally {
+        setLoading(false);
+      }
+    };
+
+    fetchMetrics();
+    const interval = setInterval(fetchMetrics, 60000); // Update every minute
+    return () => clearInterval(interval);
+  }, []);
+
+  if (loading || !metrics) return <div>Loading metrics...</div>;
+
+  return (
+    <div className="swapper-metrics">
+      <h4>📊 Metrics Summary</h4>
+      <div className="metrics-grid">
+        <div className="metric-card">
+          <div className="metric-label">Total Models</div>
+          <div className="metric-value">{metrics.summary.total_models}</div>
+        </div>
+        <div className="metric-card">
+          <div className="metric-label">Active Models</div>
+          <div className="metric-value">{metrics.summary.active_models}</div>
+        </div>
+        <div className="metric-card">
+          <div className="metric-label">Total Uptime</div>
+          <div className="metric-value">{metrics.summary.total_uptime_hours.toFixed(2)}h</div>
+        </div>
+        <div className="metric-card">
+          <div className="metric-label">Total Requests</div>
+          <div className="metric-value">{metrics.summary.total_requests}</div>
+        </div>
+      </div>
+
+      {metrics.most_used_model && (
+        <div className="most-used-model">
+          <h5>Most Used Model</h5>
+          <div className="model-info">
+            <span className="model-name">{metrics.most_used_model.name}</span>
+            <span className="model-uptime">
+              {metrics.most_used_model.uptime_hours.toFixed(2)}h
+            </span>
+          </div>
+        </div>
+      )}
+    </div>
+  );
+};
+
+// Main Swapper Page Component
+export const SwapperPage: React.FC = () => {
+  return (
+    <div className="swapper-page">
+      <div className="page-header">
+        <h2>Swapper Service</h2>
+        <p>Dynamic model loading and management</p>
+      </div>
+
+      <div className="swapper-grid">
+        <div className="swapper-main">
+          <SwapperStatusCard />
+        </div>
+        <div className="swapper-sidebar">
+          <SwapperMetricsSummary />
+        </div>
+      </div>
+    </div>
+  );
+};
+
+export default SwapperPage;
+
--- a/services/swapper-service/config/swapper_config.yaml
+++ b/services/swapper-service/config/swapper_config.yaml
@@ -0,0 +1,81 @@
+# Swapper Configuration for Node #1 (Production Server)
+# Single-active LLM scheduler
+# Hetzner GEX44 - NVIDIA RTX 4000 SFF Ada (20GB VRAM)
+# Auto-generated configuration with all available Ollama models
+
+swapper:
+  mode: single-active
+  max_concurrent_models: 1
+  model_swap_timeout: 300
+  gpu_enabled: true
+  metal_acceleration: false  # NVIDIA GPU, not Apple Silicon
+  # Модель для автоматичного завантаження при старті (опціонально)
+  # Якщо не вказано - моделі завантажуються тільки за запитом
+  # Рекомендовано: qwen3-8b (основна модель) або qwen2.5-3b-instruct (легка модель)
+  default_model: qwen3-8b  # Модель активується автоматично при старті
+
+models:
+  # Primary LLM - Qwen3 8B (High Priority) - Main model from INFRASTRUCTURE.md
+  qwen3-8b:
+    path: ollama:qwen3:8b
+    type: llm
+    size_gb: 4.87
+    priority: high
+    description: "Primary LLM for general tasks and conversations"
+    
+  # Vision Model - Qwen3-VL 8B (High Priority) - For image processing
+  qwen3-vl-8b:
+    path: ollama:qwen3-vl:8b
+    type: vision
+    size_gb: 5.72
+    priority: high
+    description: "Vision model for image understanding and processing"
+    
+  # Qwen2.5 7B Instruct (High Priority)
+  qwen2.5-7b-instruct:
+    path: ollama:qwen2.5:7b-instruct-q4_K_M
+    type: llm
+    size_gb: 4.36
+    priority: high
+    description: "Qwen2.5 7B Instruct model"
+    
+  # Lightweight LLM - Qwen2.5 3B Instruct (Medium Priority)
+  qwen2.5-3b-instruct:
+    path: ollama:qwen2.5:3b-instruct-q4_K_M
+    type: llm
+    size_gb: 1.80
+    priority: medium
+    description: "Lightweight LLM for faster responses"
+    
+  # Math Specialist - Qwen2 Math 7B (High Priority)
+  qwen2-math-7b:
+    path: ollama:qwen2-math:7b
+    type: math
+    size_gb: 4.13
+    priority: high
+    description: "Specialized model for mathematical tasks"
+
+  # Lightweight conversational LLM - Mistral Nemo 2.3B (Medium Priority)
+  mistral-nemo-2_3b:
+    path: ollama:mistral-nemo:2.3b-instruct
+    type: llm
+    size_gb: 1.60
+    priority: medium
+    description: "Fast low-cost replies for monitor/service agents"
+  
+  # Compact Math Specialist - Qwen2.5 Math 1.5B (Medium Priority)
+  qwen2_5-math-1_5b:
+    path: ollama:qwen2.5-math:1.5b
+    type: math
+    size_gb: 1.20
+    priority: medium
+    description: "Lightweight math model for DRUID/Nutra micro-calculations"
+
+storage:
+  models_dir: /app/models
+  cache_dir: /app/cache
+  swap_dir: /app/swap
+
+ollama:
+  url: http://ollama:11434  # From Docker container to Ollama service
+  timeout: 300
--- a/services/swapper-service/config/swapper_config_node1.yaml
+++ b/services/swapper-service/config/swapper_config_node1.yaml
@@ -0,0 +1,64 @@
+# Swapper Configuration for Node #1 (Production Server)
+# Single-active LLM scheduler
+# Hetzner GEX44 - NVIDIA RTX 4000 SFF Ada (20GB VRAM)
+
+swapper:
+  mode: single-active
+  max_concurrent_models: 1
+  model_swap_timeout: 300
+  gpu_enabled: true
+  metal_acceleration: false  # NVIDIA GPU, not Apple Silicon
+  # Модель для автоматичного завантаження при старті
+  # qwen3-8b - основна модель (4.87 GB), швидка відповідь на перший запит
+  default_model: qwen3-8b
+
+models:
+  # Primary LLM - Qwen3 8B (High Priority) - Main model from INFRASTRUCTURE.md
+  qwen3-8b:
+    path: ollama:qwen3:8b
+    type: llm
+    size_gb: 4.87
+    priority: high
+    description: "Primary LLM for general tasks and conversations"
+    
+  # Vision Model - Qwen3-VL 8B (High Priority) - For image processing
+  qwen3-vl-8b:
+    path: ollama:qwen3-vl:8b
+    type: vision
+    size_gb: 5.72
+    priority: high
+    description: "Vision model for image understanding and processing"
+    
+  # Qwen2.5 7B Instruct (High Priority)
+  qwen2.5-7b-instruct:
+    path: ollama:qwen2.5:7b-instruct-q4_K_M
+    type: llm
+    size_gb: 4.36
+    priority: high
+    description: "Qwen2.5 7B Instruct model"
+    
+  # Lightweight LLM - Qwen2.5 3B Instruct (Medium Priority)
+  qwen2.5-3b-instruct:
+    path: ollama:qwen2.5:3b-instruct-q4_K_M
+    type: llm
+    size_gb: 1.80
+    priority: medium
+    description: "Lightweight LLM for faster responses"
+    
+  # Math Specialist - Qwen2 Math 7B (High Priority)
+  qwen2-math-7b:
+    path: ollama:qwen2-math:7b
+    type: math
+    size_gb: 4.13
+    priority: high
+    description: "Specialized model for mathematical tasks"
+
+storage:
+  models_dir: /app/models
+  cache_dir: /app/cache
+  swap_dir: /app/swap
+
+ollama:
+  url: http://ollama:11434  # From Docker container to Ollama service
+  timeout: 300
+
--- a/services/swapper-service/config/swapper_config_node2.yaml
+++ b/services/swapper-service/config/swapper_config_node2.yaml
@@ -0,0 +1,90 @@
+# Swapper Configuration for Node #2 (Development Node)
+# Single-active LLM scheduler
+# MacBook Pro M4 Max - Apple Silicon (40-core GPU, 64GB RAM)
+# Auto-generated configuration with available Ollama models
+
+swapper:
+  mode: single-active
+  max_concurrent_models: 1
+  model_swap_timeout: 300
+  gpu_enabled: true
+  metal_acceleration: true  # Apple Silicon GPU acceleration
+  # Модель для автоматичного завантаження при старті (опціонально)
+  # Якщо не вказано - моделі завантажуються тільки за запитом
+  # Рекомендовано: gpt-oss:latest (швидка модель) або phi3:latest (легка модель)
+  default_model: gpt-oss:latest  # Модель активується автоматично при старті
+
+models:
+  # Fast LLM - GPT-OSS 20B (High Priority) - Main model for general tasks
+  gpt-oss-latest:
+    path: ollama:gpt-oss:latest
+    type: llm
+    size_gb: 13.0
+    priority: high
+    description: "Fast LLM for general tasks and conversations (20.9B params)"
+    
+  # Lightweight LLM - Phi3 3.8B (High Priority) - Fast responses
+  phi3-latest:
+    path: ollama:phi3:latest
+    type: llm
+    size_gb: 2.2
+    priority: high
+    description: "Lightweight LLM for fast responses (3.8B params)"
+    
+  # Code Specialist - StarCoder2 3B (Medium Priority) - Code engineering
+  starcoder2-3b:
+    path: ollama:starcoder2:3b
+    type: code
+    size_gb: 1.7
+    priority: medium
+    description: "Code specialist model for code engineering (3B params)"
+    
+  # Reasoning Model - Mistral Nemo 12.2B (High Priority) - Advanced reasoning
+  mistral-nemo-12b:
+    path: ollama:mistral-nemo:12b
+    type: llm
+    size_gb: 7.1
+    priority: high
+    description: "Advanced reasoning model for complex tasks (12.2B params)"
+    
+  # Reasoning Model - Gemma2 27B (Medium Priority) - Strategic reasoning
+  gemma2-27b:
+    path: ollama:gemma2:27b
+    type: llm
+    size_gb: 15.0
+    priority: medium
+    description: "Reasoning model for strategic tasks (27.2B params)"
+    
+  # Code Specialist - DeepSeek Coder 33B (High Priority) - Advanced code tasks
+  deepseek-coder-33b:
+    path: ollama:deepseek-coder:33b
+    type: code
+    size_gb: 18.0
+    priority: high
+    description: "Advanced code specialist model (33B params)"
+    
+  # Code Specialist - Qwen2.5 Coder 32B (High Priority) - Advanced code tasks
+  qwen2.5-coder-32b:
+    path: ollama:qwen2.5-coder:32b
+    type: code
+    size_gb: 19.0
+    priority: high
+    description: "Advanced code specialist model (32.8B params)"
+    
+  # Reasoning Model - DeepSeek R1 70B (High Priority) - Strategic reasoning (large model)
+  deepseek-r1-70b:
+    path: ollama:deepseek-r1:70b
+    type: llm
+    size_gb: 42.0
+    priority: high
+    description: "Strategic reasoning model (70.6B params, quantized)"
+
+storage:
+  models_dir: /app/models
+  cache_dir: /app/cache
+  swap_dir: /app/swap
+
+ollama:
+  url: http://localhost:11434  # Native Ollama on MacBook (via Pieces OS or brew)
+  timeout: 300
+
--- a/services/swapper-service/requirements.txt
+++ b/services/swapper-service/requirements.txt
@@ -0,0 +1,7 @@
+fastapi==0.104.1
+uvicorn[standard]==0.24.0
+httpx==0.25.2
+pydantic==2.5.0
+pyyaml==6.0.1
+python-multipart==0.0.6
+
--- a/services/swapper-service/start.sh
+++ b/services/swapper-service/start.sh
@@ -0,0 +1,38 @@
+#!/bin/bash
+# Start Swapper Service locally
+
+set -e
+
+echo "🚀 Starting Swapper Service..."
+
+# Check if virtual environment exists
+if [ ! -d "venv" ]; then
+    echo "📦 Creating virtual environment..."
+    python3 -m venv venv
+fi
+
+# Activate virtual environment
+source venv/bin/activate
+
+# Install dependencies
+echo "📥 Installing dependencies..."
+pip install -q --upgrade pip
+pip install -q -r requirements.txt
+
+# Set environment variables
+export OLLAMA_BASE_URL=${OLLAMA_BASE_URL:-http://localhost:11434}
+export SWAPPER_CONFIG_PATH=${SWAPPER_CONFIG_PATH:-./config/swapper_config.yaml}
+export SWAPPER_MODE=${SWAPPER_MODE:-single-active}
+export MAX_CONCURRENT_MODELS=${MAX_CONCURRENT_MODELS:-1}
+export MODEL_SWAP_TIMEOUT=${MODEL_SWAP_TIMEOUT:-30}
+
+# Start service
+echo "✅ Starting Swapper Service on port 8890..."
+echo "   Health: http://localhost:8890/health"
+echo "   Status: http://localhost:8890/status"
+echo "   Cabinet API: http://localhost:8890/api/cabinet/swapper/status"
+echo ""
+echo "Press Ctrl+C to stop"
+
+python3 -m uvicorn app.main:app --host 0.0.0.0 --port 8890
+