feat: Add presence heartbeat for Matrix online status

- matrix-gateway: POST /internal/matrix/presence/online endpoint - usePresenceHeartbeat hook with activity tracking - Auto away after 5 min inactivity - Offline on page close/visibility change - Integrated in MatrixChatRoom component
2025-11-27 00:19:40 -08:00
parent 5bed515852
commit 3de3c8cb36
6371 changed files with 1317450 additions and 932 deletions
--- a/services/swapper-service/README.md
+++ b/services/swapper-service/README.md
@@ -0,0 +1,353 @@
+# Swapper Service
+
+**Version:** 1.0.0  
+**Status:** ✅ Ready for Node #2  
+**Port:** 8890
+
+Dynamic model loading service that manages LLM models on-demand to optimize memory usage. Supports single-active mode (one model loaded at a time).
+
+---
+
+## Overview
+
+Swapper Service provides:
+- **Dynamic Model Loading** — Load/unload models on-demand
+- **Single-Active Mode** — Only one model loaded at a time (memory optimization)
+- **Model Metrics** — Track uptime, request count, load/unload times
+- **Ollama Integration** — Works with Ollama models
+- **REST API** — Full API for model management
+
+---
+
+## Features
+
+### Model Management
+- Load models on-demand
+- Unload models to free memory
+- Track which model is currently active
+- Monitor model uptime and usage
+
+### Metrics
+- Current active model
+- Model uptime (hours)
+- Request count per model
+- Load/unload timestamps
+- Total uptime per model
+
+### Single-Active Mode
+- Only one model loaded at a time
+- Automatic unloading of previous model when loading new one
+- Optimizes memory usage on resource-constrained systems
+
+---
+
+## Quick Start
+
+### Docker (Recommended)
+
+```bash
+# Build and start
+docker-compose up -d swapper-service
+
+# Check health
+curl http://localhost:8890/health
+
+# Get status
+curl http://localhost:8890/status
+
+# List models
+curl http://localhost:8890/models
+```
+
+### Local Development
+
+```bash
+cd services/swapper-service
+
+# Install dependencies
+pip install -r requirements.txt
+
+# Set environment variables
+export OLLAMA_BASE_URL=http://localhost:11434
+export SWAPPER_CONFIG_PATH=./config/swapper_config.yaml
+
+# Run service
+python -m app.main
+```
+
+---
+
+## API Endpoints
+
+### Health & Status
+
+#### GET /health
+Health check endpoint
+
+**Response:**
+```json
+{
+  "status": "healthy",
+  "service": "swapper-service",
+  "active_model": "deepseek-r1-70b",
+  "mode": "single-active"
+}
+```
+
+#### GET /status
+Get full Swapper service status
+
+**Response:**
+```json
+{
+  "status": "healthy",
+  "active_model": "deepseek-r1-70b",
+  "available_models": ["deepseek-r1-70b", "qwen2.5-coder-32b", ...],
+  "loaded_models": ["deepseek-r1-70b"],
+  "mode": "single-active",
+  "total_models": 8
+}
+```
+
+### Model Management
+
+#### GET /models
+List all available models
+
+**Response:**
+```json
+{
+  "models": [
+    {
+      "name": "deepseek-r1-70b",
+      "ollama_name": "deepseek-r1:70b",
+      "type": "llm",
+      "size_gb": 42,
+      "priority": "high",
+      "status": "loaded"
+    }
+  ]
+}
+```
+
+#### GET /models/{model_name}
+Get information about a specific model
+
+**Response:**
+```json
+{
+  "name": "deepseek-r1-70b",
+  "ollama_name": "deepseek-r1:70b",
+  "type": "llm",
+  "size_gb": 42,
+  "priority": "high",
+  "status": "loaded",
+  "loaded_at": "2025-11-22T10:30:00",
+  "unloaded_at": null,
+  "total_uptime_seconds": 3600.5
+}
+```
+
+#### POST /models/{model_name}/load
+Load a model
+
+**Response:**
+```json
+{
+  "status": "success",
+  "model": "deepseek-r1-70b",
+  "message": "Model deepseek-r1-70b loaded"
+}
+```
+
+#### POST /models/{model_name}/unload
+Unload a model
+
+**Response:**
+```json
+{
+  "status": "success",
+  "model": "deepseek-r1-70b",
+  "message": "Model deepseek-r1-70b unloaded"
+}
+```
+
+### Metrics
+
+#### GET /metrics
+Get metrics for all models
+
+**Response:**
+```json
+{
+  "metrics": [
+    {
+      "model_name": "deepseek-r1-70b",
+      "status": "loaded",
+      "loaded_at": "2025-11-22T10:30:00",
+      "uptime_hours": 1.5,
+      "request_count": 42,
+      "total_uptime_seconds": 5400.0
+    }
+  ]
+}
+```
+
+#### GET /metrics/{model_name}
+Get metrics for a specific model
+
+**Response:**
+```json
+{
+  "model_name": "deepseek-r1-70b",
+  "status": "loaded",
+  "loaded_at": "2025-11-22T10:30:00",
+  "uptime_hours": 1.5,
+  "request_count": 42,
+  "total_uptime_seconds": 5400.0
+}
+```
+
+---
+
+## Configuration
+
+### Environment Variables
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `OLLAMA_BASE_URL` | `http://localhost:11434` | Ollama API URL |
+| `SWAPPER_CONFIG_PATH` | `./config/swapper_config.yaml` | Path to config file |
+| `SWAPPER_MODE` | `single-active` | Mode: `single-active` or `multi-active` |
+| `MAX_CONCURRENT_MODELS` | `1` | Max concurrent models (for multi-active mode) |
+| `MODEL_SWAP_TIMEOUT` | `30` | Timeout for model swap (seconds) |
+
+### Config File (swapper_config.yaml)
+
+```yaml
+swapper:
+  mode: single-active
+  max_concurrent_models: 1
+  model_swap_timeout: 30
+  gpu_enabled: true
+  metal_acceleration: true
+
+models:
+  deepseek-r1-70b:
+    path: ollama:deepseek-r1:70b
+    type: llm
+    size_gb: 42
+    priority: high
+```
+
+---
+
+## Integration with Router
+
+Swapper Service integrates with DAGI Router through metadata:
+
+```python
+router_request = {
+    "message": "Your request",
+    "mode": "chat",
+    "metadata": {
+        "use_llm": "specialist_vision_8b",  # Swapper will load this model
+        "swapper_service": "http://swapper-service:8890"
+    }
+}
+```
+
+---
+
+## Monitoring
+
+### Health Check
+```bash
+curl http://localhost:8890/health
+```
+
+### Prometheus Metrics (Future)
+- `swapper_active_model` — Currently active model
+- `swapper_model_uptime_seconds` — Uptime per model
+- `swapper_model_requests_total` — Total requests per model
+
+---
+
+## Troubleshooting
+
+### Model won't load
+```bash
+# Check Ollama is running
+curl http://localhost:11434/api/tags
+
+# Check model exists in Ollama
+curl http://localhost:11434/api/tags | grep "model_name"
+
+# Check Swapper logs
+docker logs swapper-service
+```
+
+### Service not responding
+```bash
+# Check if service is running
+docker ps | grep swapper-service
+
+# Check health
+curl http://localhost:8890/health
+
+# Check logs
+docker logs -f swapper-service
+```
+
+---
+
+## Differences: Swapper Service vs vLLM
+
+**Swapper Service:**
+- Model loading/unloading manager
+- Single-active mode (one model at a time)
+- Memory optimization
+- Works with Ollama
+- Lightweight, simple API
+
+**vLLM:**
+- High-performance inference engine
+- Continuous serving (models stay loaded)
+- Optimized for throughput
+- Direct GPU acceleration
+- More complex, production-grade
+
+**Use Swapper when:**
+- Memory is limited
+- Need to switch between models frequently
+- Running on resource-constrained systems (like Node #2 MacBook)
+
+**Use vLLM when:**
+- Need maximum throughput
+- Models stay loaded for long periods
+- Have dedicated GPU resources
+- Production serving at scale
+
+---
+
+## Next Steps
+
+1. **Add to Node #2 Admin Console**
+   - Display active model
+   - Show model metrics (uptime, requests)
+   - Allow manual model loading/unloading
+
+2. **Integration with Router**
+   - Auto-load models based on request type
+   - Route requests to appropriate models
+
+3. **Metrics Dashboard**
+   - Grafana dashboard for Swapper metrics
+   - Model usage analytics
+
+---
+
+**Last Updated:** 2025-11-22  
+**Maintained by:** Ivan Tytar & DAARION Team  
+**Status:** ✅ Ready for Node #2
+