Files

Apple 3de3c8cb36 feat: Add presence heartbeat for Matrix online status

- matrix-gateway: POST /internal/matrix/presence/online endpoint
- usePresenceHeartbeat hook with activity tracking
- Auto away after 5 min inactivity
- Offline on page close/visibility change
- Integrated in MatrixChatRoom component

2025-11-27 00:19:40 -08:00

6.6 KiB

Raw Blame History

Swapper Service

Version: 1.0.0
Status: ✅ Ready for Node #2
Port: 8890

Dynamic model loading service that manages LLM models on-demand to optimize memory usage. Supports single-active mode (one model loaded at a time).

Overview

Swapper Service provides:

Dynamic Model Loading — Load/unload models on-demand
Single-Active Mode — Only one model loaded at a time (memory optimization)
Model Metrics — Track uptime, request count, load/unload times
Ollama Integration — Works with Ollama models
REST API — Full API for model management

Features

Model Management

Load models on-demand
Unload models to free memory
Track which model is currently active
Monitor model uptime and usage

Metrics

Current active model
Model uptime (hours)
Request count per model
Load/unload timestamps
Total uptime per model

Single-Active Mode

Only one model loaded at a time
Automatic unloading of previous model when loading new one
Optimizes memory usage on resource-constrained systems

Quick Start

Docker (Recommended)

# Build and start
docker-compose up -d swapper-service

# Check health
curl http://localhost:8890/health

# Get status
curl http://localhost:8890/status

# List models
curl http://localhost:8890/models

Local Development

cd services/swapper-service

# Install dependencies
pip install -r requirements.txt

# Set environment variables
export OLLAMA_BASE_URL=http://localhost:11434
export SWAPPER_CONFIG_PATH=./config/swapper_config.yaml

# Run service
python -m app.main

API Endpoints

Health & Status

GET /health

Health check endpoint

Response:

{
  "status": "healthy",
  "service": "swapper-service",
  "active_model": "deepseek-r1-70b",
  "mode": "single-active"
}

GET /status

Get full Swapper service status

Response:

{
  "status": "healthy",
  "active_model": "deepseek-r1-70b",
  "available_models": ["deepseek-r1-70b", "qwen2.5-coder-32b", ...],
  "loaded_models": ["deepseek-r1-70b"],
  "mode": "single-active",
  "total_models": 8
}

Model Management

GET /models

List all available models

Response:

{
  "models": [
    {
      "name": "deepseek-r1-70b",
      "ollama_name": "deepseek-r1:70b",
      "type": "llm",
      "size_gb": 42,
      "priority": "high",
      "status": "loaded"
    }
  ]
}

GET /models/{model_name}

Get information about a specific model

Response:

{
  "name": "deepseek-r1-70b",
  "ollama_name": "deepseek-r1:70b",
  "type": "llm",
  "size_gb": 42,
  "priority": "high",
  "status": "loaded",
  "loaded_at": "2025-11-22T10:30:00",
  "unloaded_at": null,
  "total_uptime_seconds": 3600.5
}

POST /models/{model_name}/load

Load a model

Response:

{
  "status": "success",
  "model": "deepseek-r1-70b",
  "message": "Model deepseek-r1-70b loaded"
}

POST /models/{model_name}/unload

Unload a model

Response:

{
  "status": "success",
  "model": "deepseek-r1-70b",
  "message": "Model deepseek-r1-70b unloaded"
}

Metrics

GET /metrics

Get metrics for all models

Response:

{
  "metrics": [
    {
      "model_name": "deepseek-r1-70b",
      "status": "loaded",
      "loaded_at": "2025-11-22T10:30:00",
      "uptime_hours": 1.5,
      "request_count": 42,
      "total_uptime_seconds": 5400.0
    }
  ]
}

GET /metrics/{model_name}

Get metrics for a specific model

Response:

{
  "model_name": "deepseek-r1-70b",
  "status": "loaded",
  "loaded_at": "2025-11-22T10:30:00",
  "uptime_hours": 1.5,
  "request_count": 42,
  "total_uptime_seconds": 5400.0
}

Configuration

Environment Variables

Variable	Default	Description
`OLLAMA_BASE_URL`	`http://localhost:11434`	Ollama API URL
`SWAPPER_CONFIG_PATH`	`./config/swapper_config.yaml`	Path to config file
`SWAPPER_MODE`	`single-active`	Mode: `single-active` or `multi-active`
`MAX_CONCURRENT_MODELS`	`1`	Max concurrent models (for multi-active mode)
`MODEL_SWAP_TIMEOUT`	`30`	Timeout for model swap (seconds)

Config File (swapper_config.yaml)

swapper:
  mode: single-active
  max_concurrent_models: 1
  model_swap_timeout: 30
  gpu_enabled: true
  metal_acceleration: true

models:
  deepseek-r1-70b:
    path: ollama:deepseek-r1:70b
    type: llm
    size_gb: 42
    priority: high

Integration with Router

Swapper Service integrates with DAGI Router through metadata:

router_request = {
    "message": "Your request",
    "mode": "chat",
    "metadata": {
        "use_llm": "specialist_vision_8b",  # Swapper will load this model
        "swapper_service": "http://swapper-service:8890"
    }
}

Monitoring

Health Check

curl http://localhost:8890/health

Prometheus Metrics (Future)

swapper_active_model — Currently active model
swapper_model_uptime_seconds — Uptime per model
swapper_model_requests_total — Total requests per model

Troubleshooting

Model won't load

# Check Ollama is running
curl http://localhost:11434/api/tags

# Check model exists in Ollama
curl http://localhost:11434/api/tags | grep "model_name"

# Check Swapper logs
docker logs swapper-service

Service not responding

# Check if service is running
docker ps | grep swapper-service

# Check health
curl http://localhost:8890/health

# Check logs
docker logs -f swapper-service

Differences: Swapper Service vs vLLM

Swapper Service:

Model loading/unloading manager
Single-active mode (one model at a time)
Memory optimization
Works with Ollama
Lightweight, simple API

vLLM:

High-performance inference engine
Continuous serving (models stay loaded)
Optimized for throughput
Direct GPU acceleration
More complex, production-grade

Use Swapper when:

Memory is limited
Need to switch between models frequently
Running on resource-constrained systems (like Node #2 MacBook)

Use vLLM when:

Need maximum throughput
Models stay loaded for long periods
Have dedicated GPU resources
Production serving at scale

Next Steps

Add to Node #2 Admin Console
- Display active model
- Show model metrics (uptime, requests)
- Allow manual model loading/unloading
Integration with Router
- Auto-load models based on request type
- Route requests to appropriate models
Metrics Dashboard
- Grafana dashboard for Swapper metrics
- Model usage analytics

Last Updated: 2025-11-22
Maintained by: Ivan Tytar & DAARION Team
Status: ✅ Ready for Node #2

6.6 KiB Raw Blame History

Swapper Service

Overview

Features

Model Management

Metrics

Single-Active Mode

Quick Start

Docker (Recommended)

Local Development

API Endpoints

Health & Status

GET /health

GET /status

Model Management

GET /models

GET /models/{model_name}

POST /models/{model_name}/load

POST /models/{model_name}/unload

Metrics

GET /metrics

GET /metrics/{model_name}

Configuration

Environment Variables

Config File (swapper_config.yaml)

Integration with Router

Monitoring

Health Check

Prometheus Metrics (Future)

Troubleshooting

Model won't load

Service not responding

Differences: Swapper Service vs vLLM

Next Steps

6.6 KiB

Raw Blame History