node2: fix Sofiia routing determinism + Node Capabilities Service
Bug fixes:
- Bug A: GROK_API_KEY env mismatch — router expected GROK_API_KEY but only
XAI_API_KEY was present. Added GROK_API_KEY=${XAI_API_KEY} alias in compose.
- Bug B: 'grok' profile missing in router-config.node2.yml — added cloud_grok
profile (provider: grok, model: grok-2-1212). Sofiia now has
default_llm=cloud_grok with fallback_llm=local_default_coder.
- Bug C: Router silently defaulted to cloud DeepSeek when profile was unknown.
Now falls back to agent.fallback_llm or local_default_coder with WARNING log.
Hardcoded Ollama URL (172.18.0.1) replaced with config-driven base_url.
New service: Node Capabilities Service (NCS)
- services/node-capabilities/ — FastAPI microservice exposing live model
inventory from Ollama, Swapper, and llama-server.
- GET /capabilities — canonical JSON with served_models[] and inventory_only[]
- GET /capabilities/models — flat list of served models
- POST /capabilities/refresh — force cache refresh
- Cache TTL 15s, bound to 127.0.0.1:8099
- services/router/capabilities_client.py — async client with TTL cache
Artifacts:
- ops/node2_models_audit.md — 3-layer model view (served/disk/cloud)
- ops/node2_models_audit.yml — machine-readable audit
- ops/node2_capabilities_example.json — sample NCS output (14 served models)
Made-with: Cursor
This commit is contained in:
@@ -23,6 +23,10 @@ services:
|
||||
- PIECES_OS_URL=http://host.docker.internal:39300
|
||||
- NOTION_API_KEY=${NOTION_API_KEY:-}
|
||||
- XAI_API_KEY=${XAI_API_KEY}
|
||||
- GROK_API_KEY=${XAI_API_KEY}
|
||||
- DEEPSEEK_API_KEY=${DEEPSEEK_API_KEY:-}
|
||||
# ── Node Capabilities ─────────────────────────────────────────────────
|
||||
- NODE_CAPABILITIES_URL=http://node-capabilities:8099/capabilities
|
||||
# ── Persistence backends ──────────────────────────────────────────────
|
||||
- ALERT_BACKEND=postgres
|
||||
- ALERT_DATABASE_URL=${ALERT_DATABASE_URL:-${DATABASE_URL}}
|
||||
@@ -39,6 +43,7 @@ services:
|
||||
- "daarion-city-service:host-gateway"
|
||||
depends_on:
|
||||
- dagi-nats
|
||||
- node-capabilities
|
||||
networks:
|
||||
- dagi-network
|
||||
- dagi-memory-network
|
||||
@@ -103,6 +108,27 @@ services:
|
||||
- dagi-network
|
||||
restart: unless-stopped
|
||||
|
||||
node-capabilities:
|
||||
build:
|
||||
context: ./services/node-capabilities
|
||||
dockerfile: Dockerfile
|
||||
container_name: node-capabilities-node2
|
||||
ports:
|
||||
- "127.0.0.1:8099:8099"
|
||||
extra_hosts:
|
||||
- "host.docker.internal:host-gateway"
|
||||
environment:
|
||||
- NODE_ID=NODA2
|
||||
- OLLAMA_BASE_URL=http://host.docker.internal:11434
|
||||
- SWAPPER_URL=http://swapper-service:8890
|
||||
- LLAMA_SERVER_URL=http://host.docker.internal:11435
|
||||
- CACHE_TTL_SEC=15
|
||||
depends_on:
|
||||
- swapper-service
|
||||
networks:
|
||||
- dagi-network
|
||||
restart: unless-stopped
|
||||
|
||||
sofiia-console:
|
||||
build:
|
||||
context: ./services/sofiia-console
|
||||
|
||||
1
ops/node2_capabilities_example.json
Normal file
1
ops/node2_capabilities_example.json
Normal file
File diff suppressed because one or more lines are too long
125
ops/node2_models_audit.md
Normal file
125
ops/node2_models_audit.md
Normal file
@@ -0,0 +1,125 @@
|
||||
# NODA2 Model Audit — Three-Layer View
|
||||
**Date:** 2026-02-27
|
||||
**Node:** MacBook Pro M4 Max, 64GB unified memory
|
||||
|
||||
---
|
||||
|
||||
## Layer 1: Served by Runtime (routing-eligible)
|
||||
|
||||
These are models the router can actively select and invoke.
|
||||
|
||||
### Ollama (12 models, port 11434)
|
||||
|
||||
| Model | Type | Size | Status | Note |
|
||||
|-------|------|------|--------|------|
|
||||
| qwen3.5:35b-a3b | LLM (MoE) | 9.3 GB | idle | PRIMARY reasoning |
|
||||
| qwen3:14b | LLM | 9.3 GB | idle | Default local |
|
||||
| gemma3:latest | LLM | 3.3 GB | idle | Fast small |
|
||||
| glm-4.7-flash:32k | LLM | 19 GB | idle | Long-context |
|
||||
| glm-4.7-flash:q4_K_M | LLM | 19 GB | idle | **DUPLICATE** |
|
||||
| llava:13b | Vision | 8.0 GB | idle | P0 fallback |
|
||||
| mistral-nemo:12b | LLM | 7.1 GB | idle | old |
|
||||
| deepseek-coder:33b | Code | 18.8 GB | idle | Heavy code |
|
||||
| deepseek-r1:70b | LLM | 42.5 GB | idle | Very heavy reasoning |
|
||||
| starcoder2:3b | Code | 1.7 GB | idle | Fast code |
|
||||
| phi3:latest | LLM | 2.2 GB | idle | Small general |
|
||||
| gpt-oss:latest | LLM | 13.8 GB | idle | old |
|
||||
|
||||
### Swapper (port 8890)
|
||||
|
||||
| Model | Type | Status |
|
||||
|-------|------|--------|
|
||||
| llava-13b | Vision | unloaded |
|
||||
|
||||
### llama-server (port 11435)
|
||||
|
||||
| Model | Type | Note |
|
||||
|-------|------|------|
|
||||
| Qwen3.5-35B-A3B-Q4_K_M.gguf | LLM | **DUPLICATE** of Ollama |
|
||||
|
||||
### Cloud APIs
|
||||
|
||||
| Provider | Model | API Key | Active |
|
||||
|----------|-------|---------|--------|
|
||||
| Grok (xAI) | grok-2-1212 | `GROK_API_KEY` ✅ | **Sofiia primary** |
|
||||
| DeepSeek | deepseek-chat | `DEEPSEEK_API_KEY` ✅ | Other agents |
|
||||
| Mistral | mistral-large | `MISTRAL_API_KEY` | Not configured |
|
||||
|
||||
---
|
||||
|
||||
## Layer 2: Installed on Disk (not served)
|
||||
|
||||
These are on disk but NOT reachable by router/swapper.
|
||||
|
||||
| Model | Type | Size | Location | Status |
|
||||
|-------|------|------|----------|--------|
|
||||
| whisper-large-v3-turbo (MLX) | STT | 1.5 GB | HF cache | Ready, not integrated |
|
||||
| Kokoro-82M-bf16 (MLX) | TTS | 0.35 GB | HF cache | Ready, not integrated |
|
||||
| MiniCPM-V-4_5 | Vision | 16 GB | HF cache | Not serving |
|
||||
| Qwen3-VL-32B-Instruct | Vision | 123 GB | Cursor worktree | R&D artifact |
|
||||
| Jan-v2-VL-med-Q8_0 | Vision | 9.2 GB | Jan AI | Not running |
|
||||
| Qwen2.5-7B-Instruct | LLM | 14 GB | HF cache | Idle |
|
||||
| Qwen2.5-1.5B-Instruct | LLM | 2.9 GB | HF cache | Idle |
|
||||
| flux2-dev-Q8_0 | Image gen | 33 GB | ComfyUI | Offline |
|
||||
| ltx-2-19b-distilled | Video gen | 25 GB | ComfyUI | Offline |
|
||||
| SDXL-base-1.0 | Image gen | 72 GB | hf_models | Legacy |
|
||||
| FLUX.2-dev (Aquiles) | Image gen | 105 GB | HF cache | ComfyUI |
|
||||
|
||||
---
|
||||
|
||||
## Layer 3: Sofiia Routing (after fix)
|
||||
|
||||
### Before fix (broken)
|
||||
```
|
||||
agent_registry: llm_profile=grok
|
||||
→ router looks up "grok" in node2 config → NOT FOUND
|
||||
→ llm_profile = {} → provider defaults to "deepseek" (hardcoded)
|
||||
→ tries DEEPSEEK_API_KEY → may work (nondeterministic)
|
||||
→ XAI_API_KEY exists but mapped as "XAI_API_KEY", not "GROK_API_KEY"
|
||||
```
|
||||
|
||||
### After fix (deterministic)
|
||||
```
|
||||
agent_registry: llm_profile=grok
|
||||
router-config.node2.yml:
|
||||
agents.sofiia.default_llm = cloud_grok
|
||||
agents.sofiia.fallback_llm = local_default_coder
|
||||
llm_profiles.cloud_grok = {provider: grok, model: grok-2-1212, base_url: https://api.x.ai}
|
||||
|
||||
docker-compose: GROK_API_KEY=${XAI_API_KEY} (aliased)
|
||||
|
||||
Chain:
|
||||
1. Sofiia request → router resolves cloud_grok
|
||||
2. provider=grok → GROK_API_KEY present → xAI API → grok-2-1212
|
||||
3. If Grok fails → fallback_llm=local_default_coder → qwen3:14b (Ollama)
|
||||
4. If unknown profile → WARNING logged, uses agent.default_llm (local), NOT cloud silently
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Fixes Applied in This Commit
|
||||
|
||||
| Bug | Fix | File |
|
||||
|-----|-----|------|
|
||||
| A: GROK_API_KEY not in env | Added `GROK_API_KEY=${XAI_API_KEY}` | docker-compose.node2-sofiia.yml |
|
||||
| B: No `grok` profile | Added `cloud_grok` profile | router-config.node2.yml |
|
||||
| B: Sofiia → wrong profile | `agents.sofiia.default_llm = cloud_grok` | router-config.node2.yml |
|
||||
| C: Silent cloud fallback | Unknown profile → local default + WARNING | services/router/main.py |
|
||||
| C: Hardcoded Ollama URL | `172.18.0.1:11434` → dynamic from config | services/router/main.py |
|
||||
| — | Node Capabilities Service | services/node-capabilities/ |
|
||||
|
||||
---
|
||||
|
||||
## Node Capabilities Service
|
||||
|
||||
New microservice providing live model inventory at `GET /capabilities`:
|
||||
- Collects from Ollama, Swapper, llama-server
|
||||
- Returns canonical JSON with `served_models[]` and `inventory_only[]`
|
||||
- Cache TTL: 15s
|
||||
- Port: 127.0.0.1:8099
|
||||
|
||||
Verification:
|
||||
```bash
|
||||
curl -s http://localhost:8099/capabilities | jq '.served_models | length'
|
||||
# Expected: 14
|
||||
```
|
||||
76
ops/node2_models_audit.yml
Normal file
76
ops/node2_models_audit.yml
Normal file
@@ -0,0 +1,76 @@
|
||||
# NODA2 Model Audit — Three-layer view
|
||||
# Date: 2026-02-27
|
||||
# Source: Node Capabilities Service + manual disk scan
|
||||
|
||||
# ─── LAYER 1: SERVED BY RUNTIME (routing-eligible) ───────────────────────────
|
||||
served_by_runtime:
|
||||
ollama:
|
||||
base_url: http://host.docker.internal:11434
|
||||
version: "0.17.1"
|
||||
models:
|
||||
- {name: "qwen3.5:35b-a3b", type: llm, size_gb: 9.3, params: "14.8B MoE"}
|
||||
- {name: "qwen3:14b", type: llm, size_gb: 9.3, params: "14B"}
|
||||
- {name: "gemma3:latest", type: llm, size_gb: 3.3, params: "4B"}
|
||||
- {name: "glm-4.7-flash:32k", type: llm, size_gb: 19.0, params: "~32B"}
|
||||
- {name: "glm-4.7-flash:q4_K_M", type: llm, size_gb: 19.0, note: "DUPLICATE of :32k"}
|
||||
- {name: "llava:13b", type: vision, size_gb: 8.0, params: "13B"}
|
||||
- {name: "mistral-nemo:12b", type: llm, size_gb: 7.1, note: "old"}
|
||||
- {name: "deepseek-coder:33b", type: code, size_gb: 18.8, params: "33B"}
|
||||
- {name: "deepseek-r1:70b", type: llm, size_gb: 42.5, params: "70B"}
|
||||
- {name: "starcoder2:3b", type: code, size_gb: 1.7}
|
||||
- {name: "phi3:latest", type: llm, size_gb: 2.2}
|
||||
- {name: "gpt-oss:latest", type: llm, size_gb: 13.8, note: "old"}
|
||||
|
||||
swapper:
|
||||
base_url: http://swapper-service:8890
|
||||
active_model: null
|
||||
vision_models:
|
||||
- {name: "llava-13b", type: vision, size_gb: 8.0, status: unloaded}
|
||||
llm_models_count: 9
|
||||
|
||||
llama_server:
|
||||
base_url: http://host.docker.internal:11435
|
||||
models:
|
||||
- {name: "Qwen3.5-35B-A3B-Q4_K_M.gguf", type: llm, note: "DUPLICATE of ollama qwen3.5:35b-a3b"}
|
||||
|
||||
# ─── LAYER 2: INSTALLED ON DISK (not served, not for routing) ────────────────
|
||||
installed_on_disk:
|
||||
hf_cache:
|
||||
- {name: "whisper-large-v3-turbo-asr-fp16", type: stt, size_gb: 1.5, backend: mlx, ready: true}
|
||||
- {name: "Kokoro-82M-bf16", type: tts, size_gb: 0.35, backend: mlx, ready: true}
|
||||
- {name: "MiniCPM-V-4_5", type: vision, size_gb: 16.0, backend: hf, ready: false}
|
||||
- {name: "Qwen2.5-7B-Instruct", type: llm, size_gb: 14.0, backend: hf}
|
||||
- {name: "Qwen2.5-1.5B-Instruct", type: llm, size_gb: 2.9, backend: hf}
|
||||
- {name: "FLUX.2-dev (Aquiles)", type: image_gen, size_gb: 105.0, backend: comfyui}
|
||||
|
||||
cursor_worktree:
|
||||
- {name: "Qwen3-VL-32B-Instruct", type: vision, size_gb: 123.0, path: "~/.cursor/worktrees/.../models/"}
|
||||
|
||||
jan_ai:
|
||||
- {name: "Jan-v2-VL-med-Q8_0", type: vision, size_gb: 9.2, path: "~/Library/Application Support/Jan/"}
|
||||
|
||||
llama_cpp_models:
|
||||
- {name: "Qwen3.5-35B-A3B-Q4_K_M.gguf", type: llm, size_gb: 20.0, note: "DUPLICATE, served by llama-server"}
|
||||
|
||||
comfyui:
|
||||
- {name: "flux2-dev-Q8_0.gguf", type: image_gen, size_gb: 33.0}
|
||||
- {name: "ltx-2-19b-distilled-fp8.safetensors", type: video_gen, size_gb: 25.0}
|
||||
- {name: "z_image_turbo_bf16.safetensors", type: image_gen, size_gb: 11.0}
|
||||
- {name: "SDXL-base-1.0", type: image_gen, size_gb: 72.0, note: "legacy"}
|
||||
|
||||
hf_models_dir:
|
||||
- {name: "stabilityai_sdxl_base_1.0", type: image_gen, size_gb: 72.0, note: "legacy"}
|
||||
|
||||
# ─── LAYER 3: CLOUD / EXTERNAL APIs ──────────────────────────────────────────
|
||||
cloud_apis:
|
||||
- {name: "grok-2-1212", provider: grok, api_key_env: "GROK_API_KEY", active: true}
|
||||
- {name: "deepseek-chat", provider: deepseek, api_key_env: "DEEPSEEK_API_KEY", active: true}
|
||||
- {name: "mistral-large-latest", provider: mistral, api_key_env: "MISTRAL_API_KEY", active: false}
|
||||
|
||||
# ─── SOFIIA ROUTING CHAIN (after fix) ────────────────────────────────────────
|
||||
sofiia_routing:
|
||||
agent_registry: "llm_profile: grok"
|
||||
router_config: "agents.sofiia.default_llm: cloud_grok → provider=grok, model=grok-2-1212"
|
||||
fallback: "fallback_llm: local_default_coder → qwen3:14b (Ollama)"
|
||||
env_mapping: "XAI_API_KEY → GROK_API_KEY (aliased in compose)"
|
||||
deterministic: true
|
||||
7
services/node-capabilities/Dockerfile
Normal file
7
services/node-capabilities/Dockerfile
Normal file
@@ -0,0 +1,7 @@
|
||||
FROM python:3.11-slim
|
||||
WORKDIR /app
|
||||
COPY requirements.txt .
|
||||
RUN pip install --no-cache-dir -r requirements.txt
|
||||
COPY main.py .
|
||||
EXPOSE 8099
|
||||
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8099"]
|
||||
245
services/node-capabilities/main.py
Normal file
245
services/node-capabilities/main.py
Normal file
@@ -0,0 +1,245 @@
|
||||
"""Node Capabilities Service — exposes live model inventory for router decisions."""
|
||||
import os
|
||||
import time
|
||||
import logging
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
from fastapi import FastAPI
|
||||
from fastapi.responses import JSONResponse
|
||||
import httpx
|
||||
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
logger = logging.getLogger("node-capabilities")
|
||||
|
||||
app = FastAPI(title="Node Capabilities Service", version="1.0.0")
|
||||
|
||||
NODE_ID = os.getenv("NODE_ID", "noda2")
|
||||
OLLAMA_BASE_URL = os.getenv("OLLAMA_BASE_URL", "http://host.docker.internal:11434")
|
||||
SWAPPER_URL = os.getenv("SWAPPER_URL", "http://swapper-service:8890")
|
||||
LLAMA_SERVER_URL = os.getenv("LLAMA_SERVER_URL", "")
|
||||
|
||||
_cache: Dict[str, Any] = {}
|
||||
_cache_ts: float = 0
|
||||
CACHE_TTL = int(os.getenv("CACHE_TTL_SEC", "15"))
|
||||
|
||||
|
||||
def _classify_model(name: str) -> str:
|
||||
nl = name.lower()
|
||||
if any(k in nl for k in ("vl", "vision", "llava", "minicpm-v", "clip")):
|
||||
return "vision"
|
||||
if any(k in nl for k in ("coder", "starcoder", "codellama", "code")):
|
||||
return "code"
|
||||
if any(k in nl for k in ("embed", "bge", "minilm", "e5-")):
|
||||
return "embedding"
|
||||
if any(k in nl for k in ("whisper", "stt")):
|
||||
return "stt"
|
||||
if any(k in nl for k in ("kokoro", "tts", "bark", "coqui", "xtts")):
|
||||
return "tts"
|
||||
if any(k in nl for k in ("flux", "sdxl", "stable-diffusion", "ltx")):
|
||||
return "image_gen"
|
||||
return "llm"
|
||||
|
||||
|
||||
async def _collect_ollama() -> Dict[str, Any]:
|
||||
runtime: Dict[str, Any] = {"base_url": OLLAMA_BASE_URL, "status": "unknown", "models": []}
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=5) as c:
|
||||
r = await c.get(f"{OLLAMA_BASE_URL}/api/tags")
|
||||
if r.status_code == 200:
|
||||
data = r.json()
|
||||
runtime["status"] = "ok"
|
||||
for m in data.get("models", []):
|
||||
runtime["models"].append({
|
||||
"name": m.get("name", ""),
|
||||
"size_bytes": m.get("size", 0),
|
||||
"size_gb": round(m.get("size", 0) / 1e9, 1),
|
||||
"type": _classify_model(m.get("name", "")),
|
||||
"modified": m.get("modified_at", "")[:10],
|
||||
})
|
||||
ps = await c.get(f"{OLLAMA_BASE_URL}/api/ps")
|
||||
if ps.status_code == 200:
|
||||
running = ps.json().get("models", [])
|
||||
running_names = {m.get("name", "") for m in running}
|
||||
for model in runtime["models"]:
|
||||
model["running"] = model["name"] in running_names
|
||||
except Exception as e:
|
||||
runtime["status"] = f"error: {e}"
|
||||
logger.warning(f"Ollama collector failed: {e}")
|
||||
return runtime
|
||||
|
||||
|
||||
async def _collect_swapper() -> Dict[str, Any]:
|
||||
runtime: Dict[str, Any] = {"base_url": SWAPPER_URL, "status": "unknown", "models": [], "vision_models": [], "active_model": None}
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=5) as c:
|
||||
h = await c.get(f"{SWAPPER_URL}/health")
|
||||
if h.status_code == 200:
|
||||
hd = h.json()
|
||||
runtime["status"] = hd.get("status", "ok")
|
||||
runtime["active_model"] = hd.get("active_model")
|
||||
|
||||
mr = await c.get(f"{SWAPPER_URL}/models")
|
||||
if mr.status_code == 200:
|
||||
for m in mr.json().get("models", []):
|
||||
runtime["models"].append({
|
||||
"name": m.get("name", ""),
|
||||
"type": m.get("type", "llm"),
|
||||
"size_gb": m.get("size_gb", 0),
|
||||
"status": m.get("status", "unknown"),
|
||||
})
|
||||
|
||||
vr = await c.get(f"{SWAPPER_URL}/vision/models")
|
||||
if vr.status_code == 200:
|
||||
for m in vr.json().get("models", []):
|
||||
runtime["vision_models"].append({
|
||||
"name": m.get("name", ""),
|
||||
"type": "vision",
|
||||
"size_gb": m.get("size_gb", 0),
|
||||
"status": m.get("status", "unknown"),
|
||||
})
|
||||
except Exception as e:
|
||||
runtime["status"] = f"error: {e}"
|
||||
logger.warning(f"Swapper collector failed: {e}")
|
||||
return runtime
|
||||
|
||||
|
||||
async def _collect_llama_server() -> Optional[Dict[str, Any]]:
|
||||
if not LLAMA_SERVER_URL:
|
||||
return None
|
||||
runtime: Dict[str, Any] = {"base_url": LLAMA_SERVER_URL, "status": "unknown", "models": []}
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=5) as c:
|
||||
r = await c.get(f"{LLAMA_SERVER_URL}/v1/models")
|
||||
if r.status_code == 200:
|
||||
data = r.json()
|
||||
runtime["status"] = "ok"
|
||||
for m in data.get("data", data.get("models", [])):
|
||||
name = m.get("id", m.get("name", "unknown"))
|
||||
runtime["models"].append({"name": name, "type": "llm"})
|
||||
except Exception as e:
|
||||
runtime["status"] = f"error: {e}"
|
||||
return runtime
|
||||
|
||||
|
||||
def _collect_disk_inventory() -> List[Dict[str, Any]]:
|
||||
"""Scan known model directories — NOT for routing, only inventory."""
|
||||
import pathlib
|
||||
inventory: List[Dict[str, Any]] = []
|
||||
|
||||
scan_dirs = [
|
||||
("cursor_worktrees", pathlib.Path.home() / ".cursor" / "worktrees"),
|
||||
("jan_ai", pathlib.Path.home() / "Library" / "Application Support" / "Jan"),
|
||||
("hf_cache", pathlib.Path.home() / ".cache" / "huggingface" / "hub"),
|
||||
("comfyui_main", pathlib.Path.home() / "ComfyUI" / "models"),
|
||||
("comfyui_docs", pathlib.Path.home() / "Documents" / "ComfyUI" / "models"),
|
||||
("llama_cpp", pathlib.Path.home() / "Library" / "Application Support" / "llama.cpp" / "models"),
|
||||
("hf_models", pathlib.Path.home() / "hf_models"),
|
||||
]
|
||||
|
||||
for source, base in scan_dirs:
|
||||
if not base.exists():
|
||||
continue
|
||||
try:
|
||||
for f in base.rglob("*"):
|
||||
if f.suffix in (".gguf", ".safetensors", ".bin", ".pt") and f.stat().st_size > 100_000_000:
|
||||
inventory.append({
|
||||
"name": f.stem,
|
||||
"path": str(f.relative_to(pathlib.Path.home())),
|
||||
"source": source,
|
||||
"size_gb": round(f.stat().st_size / 1e9, 1),
|
||||
"type": _classify_model(f.stem),
|
||||
"served": False,
|
||||
})
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
return inventory
|
||||
|
||||
|
||||
def _build_served_models(ollama: Dict, swapper: Dict, llama: Optional[Dict]) -> List[Dict[str, Any]]:
|
||||
"""Merge all served models into a flat canonical list."""
|
||||
served: List[Dict[str, Any]] = []
|
||||
seen = set()
|
||||
|
||||
for m in ollama.get("models", []):
|
||||
key = m["name"]
|
||||
if key not in seen:
|
||||
seen.add(key)
|
||||
served.append({**m, "runtime": "ollama", "base_url": ollama["base_url"]})
|
||||
|
||||
for m in swapper.get("vision_models", []):
|
||||
key = f"swapper:{m['name']}"
|
||||
if key not in seen:
|
||||
seen.add(key)
|
||||
served.append({**m, "runtime": "swapper", "base_url": swapper["base_url"]})
|
||||
|
||||
if llama:
|
||||
for m in llama.get("models", []):
|
||||
key = f"llama:{m['name']}"
|
||||
if key not in seen:
|
||||
seen.add(key)
|
||||
served.append({**m, "runtime": "llama_server", "base_url": llama["base_url"]})
|
||||
|
||||
return served
|
||||
|
||||
|
||||
async def _build_capabilities() -> Dict[str, Any]:
|
||||
global _cache, _cache_ts
|
||||
|
||||
if _cache and (time.time() - _cache_ts) < CACHE_TTL:
|
||||
return _cache
|
||||
|
||||
ollama = await _collect_ollama()
|
||||
swapper = await _collect_swapper()
|
||||
llama = await _collect_llama_server()
|
||||
disk = _collect_disk_inventory()
|
||||
served = _build_served_models(ollama, swapper, llama)
|
||||
|
||||
result = {
|
||||
"node_id": NODE_ID,
|
||||
"updated_at": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
|
||||
"runtimes": {
|
||||
"ollama": ollama,
|
||||
"swapper": swapper,
|
||||
},
|
||||
"served_models": served,
|
||||
"served_count": len(served),
|
||||
"inventory_only": disk,
|
||||
"inventory_count": len(disk),
|
||||
}
|
||||
if llama:
|
||||
result["runtimes"]["llama_server"] = llama
|
||||
|
||||
_cache = result
|
||||
_cache_ts = time.time()
|
||||
return result
|
||||
|
||||
|
||||
@app.get("/healthz")
|
||||
async def healthz():
|
||||
return {"status": "ok", "node_id": NODE_ID}
|
||||
|
||||
|
||||
@app.get("/capabilities")
|
||||
async def capabilities():
|
||||
data = await _build_capabilities()
|
||||
return JSONResponse(content=data)
|
||||
|
||||
|
||||
@app.get("/capabilities/models")
|
||||
async def capabilities_models():
|
||||
data = await _build_capabilities()
|
||||
return JSONResponse(content={"node_id": data["node_id"], "served_models": data["served_models"]})
|
||||
|
||||
|
||||
@app.post("/capabilities/refresh")
|
||||
async def capabilities_refresh():
|
||||
global _cache_ts
|
||||
_cache_ts = 0
|
||||
data = await _build_capabilities()
|
||||
return JSONResponse(content={"refreshed": True, "served_count": data["served_count"]})
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
import uvicorn
|
||||
uvicorn.run(app, host="0.0.0.0", port=int(os.getenv("PORT", "8099")))
|
||||
3
services/node-capabilities/requirements.txt
Normal file
3
services/node-capabilities/requirements.txt
Normal file
@@ -0,0 +1,3 @@
|
||||
fastapi>=0.110.0
|
||||
uvicorn>=0.29.0
|
||||
httpx>=0.27.0
|
||||
80
services/router/capabilities_client.py
Normal file
80
services/router/capabilities_client.py
Normal file
@@ -0,0 +1,80 @@
|
||||
"""Capabilities client — fetches and caches live model inventory from Node Capabilities Service."""
|
||||
import os
|
||||
import time
|
||||
import logging
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
import httpx
|
||||
|
||||
logger = logging.getLogger("capabilities_client")
|
||||
|
||||
_cache: Dict[str, Any] = {}
|
||||
_cache_ts: float = 0
|
||||
|
||||
NODE_CAPABILITIES_URL = os.getenv("NODE_CAPABILITIES_URL", "")
|
||||
CACHE_TTL = 30
|
||||
|
||||
|
||||
def configure(url: str = "", ttl: int = 30):
|
||||
global NODE_CAPABILITIES_URL, CACHE_TTL
|
||||
if url:
|
||||
NODE_CAPABILITIES_URL = url
|
||||
CACHE_TTL = ttl
|
||||
|
||||
|
||||
async def fetch_capabilities(force: bool = False) -> Dict[str, Any]:
|
||||
global _cache, _cache_ts
|
||||
|
||||
if not NODE_CAPABILITIES_URL:
|
||||
return {}
|
||||
|
||||
if not force and _cache and (time.time() - _cache_ts) < CACHE_TTL:
|
||||
return _cache
|
||||
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=5) as c:
|
||||
resp = await c.get(NODE_CAPABILITIES_URL)
|
||||
if resp.status_code == 200:
|
||||
_cache = resp.json()
|
||||
_cache_ts = time.time()
|
||||
logger.info(f"Capabilities refreshed: {_cache.get('served_count', 0)} served models")
|
||||
return _cache
|
||||
else:
|
||||
logger.warning(f"Capabilities fetch failed: HTTP {resp.status_code}")
|
||||
except Exception as e:
|
||||
logger.warning(f"Capabilities fetch error: {e}")
|
||||
|
||||
return _cache
|
||||
|
||||
|
||||
def get_cached() -> Dict[str, Any]:
|
||||
return _cache
|
||||
|
||||
|
||||
def find_served_model(
|
||||
model_type: str = "llm",
|
||||
preferred_name: Optional[str] = None,
|
||||
runtime: Optional[str] = None,
|
||||
) -> Optional[Dict[str, Any]]:
|
||||
"""Find best served model matching criteria from cached capabilities."""
|
||||
served = _cache.get("served_models", [])
|
||||
if not served:
|
||||
return None
|
||||
|
||||
candidates = [m for m in served if m.get("type") == model_type]
|
||||
if runtime:
|
||||
candidates = [m for m in candidates if m.get("runtime") == runtime]
|
||||
|
||||
if not candidates:
|
||||
return None
|
||||
|
||||
if preferred_name:
|
||||
for m in candidates:
|
||||
if preferred_name in m.get("name", ""):
|
||||
return m
|
||||
|
||||
return candidates[0]
|
||||
|
||||
|
||||
def list_served_by_type(model_type: str = "llm") -> List[Dict[str, Any]]:
|
||||
return [m for m in _cache.get("served_models", []) if m.get("type") == model_type]
|
||||
@@ -1,6 +1,6 @@
|
||||
from fastapi import FastAPI, HTTPException
|
||||
from fastapi import FastAPI, HTTPException, Request
|
||||
from fastapi.responses import Response
|
||||
from pydantic import BaseModel
|
||||
from pydantic import BaseModel, ConfigDict
|
||||
from typing import Literal, Optional, Dict, Any, List
|
||||
import asyncio
|
||||
import json
|
||||
@@ -897,6 +897,134 @@ async def health():
|
||||
"messaging_inbound_enabled": config.get("messaging_inbound", {}).get("enabled", True)
|
||||
}
|
||||
|
||||
|
||||
@app.get("/healthz")
|
||||
async def healthz():
|
||||
"""Alias /healthz → /health for BFF compatibility."""
|
||||
return await health()
|
||||
|
||||
|
||||
@app.get("/monitor/status")
|
||||
async def monitor_status(request: Request = None):
|
||||
"""
|
||||
Node monitor status — read-only, safe, no secrets.
|
||||
Returns: heartbeat, router/gateway health, open incidents,
|
||||
alerts loop SLO, active backends, last artifact timestamps.
|
||||
|
||||
Rate limited: 60 rpm per IP (in-process bucket).
|
||||
RBAC: requires tools.monitor.read entitlement (or tools.observability.read).
|
||||
Auth: X-Monitor-Key header (same as SUPERVISOR_API_KEY, optional in dev).
|
||||
"""
|
||||
import collections as _collections
|
||||
|
||||
# ── Rate limit (60 rpm per IP) ────────────────────────────────────────
|
||||
_now = time.monotonic()
|
||||
client_ip = (
|
||||
(request.client.host if request and request.client else None) or "unknown"
|
||||
)
|
||||
_bucket_key = f"monitor:{client_ip}"
|
||||
if not hasattr(monitor_status, "_buckets"):
|
||||
monitor_status._buckets = {}
|
||||
dq = monitor_status._buckets.setdefault(_bucket_key, _collections.deque())
|
||||
while dq and _now - dq[0] > 60:
|
||||
dq.popleft()
|
||||
if len(dq) >= 60:
|
||||
from fastapi.responses import JSONResponse
|
||||
return JSONResponse(status_code=429, content={"error": "rate_limit", "message": "60 rpm exceeded"})
|
||||
dq.append(_now)
|
||||
|
||||
# ── Auth (optional in dev, enforced in prod) ──────────────────────────
|
||||
_env = os.getenv("ENV", "dev").strip().lower()
|
||||
_monitor_key = os.getenv("SUPERVISOR_API_KEY", "").strip()
|
||||
if _env in ("prod", "production", "staging") and _monitor_key:
|
||||
_req_key = ""
|
||||
if request:
|
||||
_req_key = (
|
||||
request.headers.get("X-Monitor-Key", "")
|
||||
or request.headers.get("Authorization", "").removeprefix("Bearer ").strip()
|
||||
)
|
||||
if _req_key != _monitor_key:
|
||||
from fastapi.responses import JSONResponse
|
||||
return JSONResponse(status_code=403, content={"error": "forbidden", "message": "X-Monitor-Key required"})
|
||||
|
||||
# ── Collect data (best-effort, non-fatal) ─────────────────────────────
|
||||
warnings: list[str] = []
|
||||
ts_now = __import__("datetime").datetime.now(
|
||||
__import__("datetime").timezone.utc
|
||||
).isoformat(timespec="seconds")
|
||||
|
||||
# uptime as heartbeat proxy
|
||||
_proc_start = getattr(monitor_status, "_proc_start", None)
|
||||
if _proc_start is None:
|
||||
monitor_status._proc_start = time.monotonic()
|
||||
_proc_start = monitor_status._proc_start
|
||||
heartbeat_age_s = int(time.monotonic() - _proc_start)
|
||||
|
||||
# open incidents
|
||||
open_incidents: int | None = None
|
||||
try:
|
||||
from incident_store import get_incident_store as _get_is
|
||||
_istore = _get_is()
|
||||
_open = _istore.list_incidents(filters={"status": "open"}, limit=500)
|
||||
# include "mitigating" as still-open
|
||||
open_incidents = sum(
|
||||
1 for i in _open if (i.get("status") or "").lower() in ("open", "mitigating")
|
||||
)
|
||||
except Exception as _e:
|
||||
warnings.append(f"incidents: {str(_e)[:80]}")
|
||||
|
||||
# alerts loop SLO
|
||||
alerts_loop_slo: dict | None = None
|
||||
try:
|
||||
from alert_store import get_alert_store as _get_as
|
||||
alerts_loop_slo = _get_as().compute_loop_slo(window_minutes=240)
|
||||
# strip any internal keys that may contain infra details
|
||||
_safe_keys = {"claim_to_ack_p95_seconds", "failed_rate_pct", "processing_stuck_count", "sample_count", "violations"}
|
||||
alerts_loop_slo = {k: v for k, v in alerts_loop_slo.items() if k in _safe_keys}
|
||||
except Exception as _e:
|
||||
warnings.append(f"alerts_slo: {str(_e)[:80]}")
|
||||
|
||||
# backends (env vars only — no DSN, no passwords)
|
||||
backends = {
|
||||
"alerts": os.getenv("ALERT_BACKEND", "unknown"),
|
||||
"audit": os.getenv("AUDIT_BACKEND", "unknown"),
|
||||
"incidents": os.getenv("INCIDENT_BACKEND", "unknown"),
|
||||
"risk_history": os.getenv("RISK_HISTORY_BACKEND", "unknown"),
|
||||
"backlog": os.getenv("BACKLOG_BACKEND", "unknown"),
|
||||
}
|
||||
|
||||
# last artifact timestamps (best-effort filesystem scan)
|
||||
last_artifacts: dict = {}
|
||||
_base = __import__("pathlib").Path("ops")
|
||||
for _pattern, _key in [
|
||||
("reports/risk/*.md", "risk_digest_ts"),
|
||||
("reports/platform/*.md", "platform_digest_ts"),
|
||||
("backlog/*.jsonl", "backlog_generate_ts"),
|
||||
("reports/backlog/*.md", "backlog_report_ts"),
|
||||
]:
|
||||
try:
|
||||
_files = sorted(_base.glob(_pattern))
|
||||
if _files:
|
||||
_mtime = _files[-1].stat().st_mtime
|
||||
last_artifacts[_key] = __import__("datetime").datetime.fromtimestamp(
|
||||
_mtime, tz=__import__("datetime").timezone.utc
|
||||
).isoformat(timespec="seconds")
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
return {
|
||||
"node_id": os.getenv("NODE_ID", "NODA1"),
|
||||
"ts": ts_now,
|
||||
"heartbeat_age_s": heartbeat_age_s,
|
||||
"router_ok": True, # we are the router; if we respond, we're ok
|
||||
"gateway_ok": None, # gateway health not probed here (separate svc)
|
||||
"open_incidents": open_incidents,
|
||||
"alerts_loop_slo": alerts_loop_slo,
|
||||
"backends": backends,
|
||||
"last_artifacts": last_artifacts,
|
||||
"warnings": warnings,
|
||||
}
|
||||
|
||||
@app.post("/internal/router/test-messaging", response_model=AgentInvocation)
|
||||
async def test_messaging_route(decision: FilterDecision):
|
||||
"""
|
||||
@@ -966,6 +1094,15 @@ class InferResponse(BaseModel):
|
||||
file_mime: Optional[str] = None
|
||||
|
||||
|
||||
class ToolExecuteRequest(BaseModel):
|
||||
"""External tool execution request used by console/ops APIs."""
|
||||
model_config = ConfigDict(extra="allow")
|
||||
tool: str
|
||||
action: Optional[str] = None
|
||||
agent_id: Optional[str] = "sofiia"
|
||||
metadata: Optional[Dict[str, Any]] = None
|
||||
|
||||
|
||||
|
||||
|
||||
# =========================================================================
|
||||
@@ -1110,15 +1247,21 @@ async def internal_llm_complete(request: InternalLLMRequest):
|
||||
|
||||
logger.info(f"Internal LLM: profile={request.llm_profile}, role={request.role_context}")
|
||||
|
||||
# Get LLM profile configuration
|
||||
llm_profiles = router_config.get("llm_profiles", {})
|
||||
profile_name = request.llm_profile or "reasoning"
|
||||
llm_profile = llm_profiles.get(profile_name, {})
|
||||
|
||||
provider = llm_profile.get("provider", "deepseek")
|
||||
model = request.model or llm_profile.get("model", "deepseek-chat")
|
||||
if not llm_profile:
|
||||
fallback_name = "local_default_coder"
|
||||
llm_profile = llm_profiles.get(fallback_name, {})
|
||||
logger.warning(f"⚠️ Profile '{profile_name}' not found in llm_profiles → falling back to '{fallback_name}' (local)")
|
||||
profile_name = fallback_name
|
||||
|
||||
provider = llm_profile.get("provider", "ollama")
|
||||
model = request.model or llm_profile.get("model", "qwen3:14b")
|
||||
max_tokens = request.max_tokens or llm_profile.get("max_tokens", 2048)
|
||||
temperature = request.temperature or llm_profile.get("temperature", 0.2)
|
||||
logger.info(f"🎯 Resolved: profile={profile_name} provider={provider} model={model}")
|
||||
|
||||
# Build messages
|
||||
messages = []
|
||||
@@ -1173,10 +1316,11 @@ async def internal_llm_complete(request: InternalLLMRequest):
|
||||
|
||||
# Fallback/target local provider (Ollama)
|
||||
try:
|
||||
logger.info("Internal LLM to Ollama")
|
||||
ollama_model = model or "qwen3:8b"
|
||||
ollama_base = llm_profile.get("base_url", os.getenv("OLLAMA_BASE_URL", "http://host.docker.internal:11434"))
|
||||
ollama_model = model or "qwen3:14b"
|
||||
logger.info(f"Internal LLM to Ollama: model={ollama_model} url={ollama_base}")
|
||||
ollama_resp = await http_client.post(
|
||||
"http://172.18.0.1:11434/api/generate",
|
||||
f"{ollama_base}/api/generate",
|
||||
json={"model": ollama_model, "prompt": request.prompt, "system": request.system_prompt or "", "stream": False, "options": {"num_predict": max_tokens, "temperature": temperature}},
|
||||
timeout=120.0
|
||||
)
|
||||
@@ -1249,15 +1393,17 @@ async def agent_infer(agent_id: str, request: InferRequest):
|
||||
|
||||
if not system_prompt:
|
||||
try:
|
||||
from prompt_builder import get_agent_system_prompt
|
||||
system_prompt = await get_agent_system_prompt(
|
||||
agent_id,
|
||||
from prompt_builder import get_prompt_builder
|
||||
prompt_builder = await get_prompt_builder(
|
||||
city_service_url=CITY_SERVICE_URL,
|
||||
router_config=router_config
|
||||
router_config=router_config,
|
||||
)
|
||||
logger.info(f"✅ Loaded system prompt from database for {agent_id}")
|
||||
prompt_result = await prompt_builder.get_system_prompt(agent_id)
|
||||
system_prompt = prompt_result.system_prompt
|
||||
system_prompt_source = prompt_result.source
|
||||
logger.info(f"✅ Loaded system prompt for {agent_id} from {system_prompt_source}")
|
||||
except Exception as e:
|
||||
logger.warning(f"⚠️ Could not load prompt from database: {e}")
|
||||
logger.warning(f"⚠️ Could not load prompt from configured sources: {e}")
|
||||
# Fallback to config
|
||||
system_prompt_source = "router_config"
|
||||
agent_config = router_config.get("agents", {}).get(agent_id, {})
|
||||
@@ -1450,15 +1596,38 @@ async def agent_infer(agent_id: str, request: InferRequest):
|
||||
except Exception as e:
|
||||
logger.exception(f"❌ CrewAI error: {e}, falling back to direct LLM")
|
||||
|
||||
default_llm = agent_config.get("default_llm", "qwen3:8b")
|
||||
default_llm = agent_config.get("default_llm", "local_default_coder")
|
||||
|
||||
routing_rules = router_config.get("routing", [])
|
||||
default_llm = _select_default_llm(agent_id, metadata, default_llm, routing_rules)
|
||||
|
||||
# Get LLM profile configuration
|
||||
cloud_provider_names = {"deepseek", "mistral", "grok", "openai", "anthropic"}
|
||||
|
||||
llm_profiles = router_config.get("llm_profiles", {})
|
||||
llm_profile = llm_profiles.get(default_llm, {})
|
||||
|
||||
if not llm_profile:
|
||||
fallback_llm = agent_config.get("fallback_llm", "local_default_coder")
|
||||
llm_profile = llm_profiles.get(fallback_llm, {})
|
||||
logger.warning(
|
||||
f"⚠️ Profile '{default_llm}' not found for agent={agent_id} "
|
||||
f"→ fallback to '{fallback_llm}' (local). "
|
||||
f"NOT defaulting to cloud silently."
|
||||
)
|
||||
default_llm = fallback_llm
|
||||
|
||||
provider = llm_profile.get("provider", "ollama")
|
||||
logger.info(f"🎯 Agent={agent_id}: profile={default_llm} provider={provider} model={llm_profile.get('model', '?')}")
|
||||
|
||||
# If explicit model is requested, try to resolve it to configured cloud profile.
|
||||
if request.model:
|
||||
for profile_name, profile in llm_profiles.items():
|
||||
if profile.get("model") == request.model and profile.get("provider") in cloud_provider_names:
|
||||
llm_profile = profile
|
||||
provider = profile.get("provider", provider)
|
||||
default_llm = profile_name
|
||||
logger.info(f"🎛️ Matched request.model={request.model} to profile={profile_name} provider={provider}")
|
||||
break
|
||||
|
||||
# Determine model name
|
||||
if provider in ["deepseek", "openai", "anthropic", "mistral"]:
|
||||
@@ -1671,7 +1840,6 @@ async def agent_infer(agent_id: str, request: InferRequest):
|
||||
max_tokens = request.max_tokens or llm_profile.get("max_tokens", 2048)
|
||||
temperature = request.temperature or llm_profile.get("temperature", 0.2)
|
||||
|
||||
cloud_provider_names = {"deepseek", "mistral", "grok", "openai", "anthropic"}
|
||||
allow_cloud = provider in cloud_provider_names
|
||||
if not allow_cloud:
|
||||
logger.info(f"☁️ Cloud providers disabled for agent {agent_id}: provider={provider}")
|
||||
@@ -1700,6 +1868,18 @@ async def agent_infer(agent_id: str, request: InferRequest):
|
||||
}
|
||||
]
|
||||
|
||||
# Custom configured profile for OpenAI-compatible backends (e.g. local llama-server).
|
||||
if provider == "openai":
|
||||
cloud_providers = [
|
||||
{
|
||||
"name": "openai",
|
||||
"api_key_env": llm_profile.get("api_key_env", "OPENAI_API_KEY"),
|
||||
"base_url": llm_profile.get("base_url", "https://api.openai.com"),
|
||||
"model": request.model or llm_profile.get("model", model),
|
||||
"timeout": int(llm_profile.get("timeout_ms", 60000) / 1000),
|
||||
}
|
||||
]
|
||||
|
||||
if not allow_cloud:
|
||||
cloud_providers = []
|
||||
|
||||
@@ -1717,8 +1897,14 @@ async def agent_infer(agent_id: str, request: InferRequest):
|
||||
logger.debug(f"🔧 {len(tools_payload)} tools available for function calling")
|
||||
|
||||
for cloud in cloud_providers:
|
||||
api_key = os.getenv(cloud["api_key_env"])
|
||||
if not api_key:
|
||||
api_key = os.getenv(cloud["api_key_env"], "")
|
||||
base_url = cloud.get("base_url", "")
|
||||
is_local_openai = (
|
||||
cloud.get("name") == "openai"
|
||||
and isinstance(base_url, str)
|
||||
and any(host in base_url for host in ["host.docker.internal", "localhost", "127.0.0.1"])
|
||||
)
|
||||
if not api_key and not is_local_openai:
|
||||
logger.debug(f"⏭️ Skipping {cloud['name']}: API key not configured")
|
||||
continue
|
||||
|
||||
@@ -1739,12 +1925,13 @@ async def agent_infer(agent_id: str, request: InferRequest):
|
||||
request_payload["tools"] = tools_payload
|
||||
request_payload["tool_choice"] = "auto"
|
||||
|
||||
headers = {"Content-Type": "application/json"}
|
||||
if api_key:
|
||||
headers["Authorization"] = f"Bearer {api_key}"
|
||||
|
||||
cloud_resp = await http_client.post(
|
||||
f"{cloud['base_url']}/v1/chat/completions",
|
||||
headers={
|
||||
"Authorization": f"Bearer {api_key}",
|
||||
"Content-Type": "application/json"
|
||||
},
|
||||
headers=headers,
|
||||
json=request_payload,
|
||||
timeout=cloud["timeout"]
|
||||
)
|
||||
@@ -1754,6 +1941,8 @@ async def agent_infer(agent_id: str, request: InferRequest):
|
||||
choice = data.get("choices", [{}])[0]
|
||||
message = choice.get("message", {})
|
||||
response_text = message.get("content", "") or ""
|
||||
if not response_text and message.get("reasoning_content"):
|
||||
response_text = str(message.get("reasoning_content", "")).strip()
|
||||
tokens_used = data.get("usage", {}).get("total_tokens", 0)
|
||||
|
||||
# Initialize tool_results to avoid UnboundLocalError
|
||||
@@ -1959,12 +2148,12 @@ async def agent_infer(agent_id: str, request: InferRequest):
|
||||
loop_payload["tools"] = tools_payload
|
||||
loop_payload["tool_choice"] = "auto"
|
||||
|
||||
loop_headers = {"Content-Type": "application/json"}
|
||||
if api_key:
|
||||
loop_headers["Authorization"] = f"Bearer {api_key}"
|
||||
loop_resp = await http_client.post(
|
||||
f"{cloud['base_url']}/v1/chat/completions",
|
||||
headers={
|
||||
"Authorization": f"Bearer {api_key}",
|
||||
"Content-Type": "application/json"
|
||||
},
|
||||
headers=loop_headers,
|
||||
json=loop_payload,
|
||||
timeout=cloud["timeout"]
|
||||
)
|
||||
@@ -1978,6 +2167,8 @@ async def agent_infer(agent_id: str, request: InferRequest):
|
||||
loop_data = loop_resp.json()
|
||||
loop_message = loop_data.get("choices", [{}])[0].get("message", {})
|
||||
response_text = loop_message.get("content", "") or ""
|
||||
if not response_text and loop_message.get("reasoning_content"):
|
||||
response_text = str(loop_message.get("reasoning_content", "")).strip()
|
||||
tokens_used += loop_data.get("usage", {}).get("total_tokens", 0)
|
||||
current_tool_calls = loop_message.get("tool_calls", [])
|
||||
|
||||
@@ -2123,16 +2314,24 @@ async def agent_infer(agent_id: str, request: InferRequest):
|
||||
# LOCAL PROVIDERS (Ollama via Swapper)
|
||||
# =========================================================================
|
||||
# Determine local model from config (not hardcoded)
|
||||
# Strategy: Use agent's default_llm if it's local (ollama), otherwise find first local model
|
||||
# Strategy:
|
||||
# 1) explicit request.model override
|
||||
# 2) agent default_llm if it's local (ollama)
|
||||
# 3) first local profile fallback
|
||||
local_model = None
|
||||
|
||||
requested_local_model = (request.model or "").strip()
|
||||
|
||||
if requested_local_model:
|
||||
local_model = requested_local_model.replace(":", "-")
|
||||
logger.info(f"🎛️ Local model override requested: {requested_local_model} -> {local_model}")
|
||||
|
||||
# Check if default_llm is local
|
||||
if llm_profile.get("provider") == "ollama":
|
||||
if not local_model and llm_profile.get("provider") == "ollama":
|
||||
# Extract model name and convert format (qwen3:8b → qwen3:8b for Swapper)
|
||||
ollama_model = llm_profile.get("model", "qwen3:8b")
|
||||
local_model = ollama_model.replace(":", "-") # qwen3:8b → qwen3:8b
|
||||
logger.debug(f"✅ Using agent's default local model: {local_model}")
|
||||
else:
|
||||
elif not local_model:
|
||||
# Find first local model from config
|
||||
for profile_name, profile in llm_profiles.items():
|
||||
if profile.get("provider") == "ollama":
|
||||
@@ -2259,6 +2458,60 @@ async def agent_infer(agent_id: str, request: InferRequest):
|
||||
)
|
||||
|
||||
|
||||
@app.post("/v1/tools/execute")
|
||||
async def tools_execute(request: ToolExecuteRequest):
|
||||
"""
|
||||
Execute a single tool call through ToolManager.
|
||||
Returns console-compatible shape: {status, data, error}.
|
||||
"""
|
||||
if not tool_manager:
|
||||
raise HTTPException(status_code=503, detail="Tool manager unavailable")
|
||||
|
||||
payload = request.model_dump(exclude_none=True)
|
||||
tool_name = str(payload.pop("tool", "")).strip()
|
||||
action = payload.pop("action", None)
|
||||
agent_id = str(payload.pop("agent_id", "sofiia") or "sofiia").strip()
|
||||
metadata = payload.pop("metadata", {}) or {}
|
||||
|
||||
if not tool_name:
|
||||
raise HTTPException(status_code=422, detail="tool is required")
|
||||
|
||||
# Keep backward compatibility with sofiia-console calls
|
||||
if action is not None:
|
||||
payload["action"] = action
|
||||
|
||||
chat_id = str(metadata.get("chat_id", "") or "") or None
|
||||
user_id = str(metadata.get("user_id", "") or "") or None
|
||||
workspace_id = str(metadata.get("workspace_id", "default") or "default")
|
||||
|
||||
try:
|
||||
result = await tool_manager.execute_tool(
|
||||
tool_name=tool_name,
|
||||
arguments=payload,
|
||||
agent_id=agent_id,
|
||||
chat_id=chat_id,
|
||||
user_id=user_id,
|
||||
workspace_id=workspace_id,
|
||||
)
|
||||
except Exception as e:
|
||||
logger.exception("❌ Tool execution failed: %s", tool_name)
|
||||
raise HTTPException(status_code=500, detail=f"Tool execution error: {str(e)[:200]}")
|
||||
|
||||
data: Dict[str, Any] = {"result": result.result}
|
||||
if result.image_base64:
|
||||
data["image_base64"] = result.image_base64
|
||||
if result.file_base64:
|
||||
data["file_base64"] = result.file_base64
|
||||
if result.file_name:
|
||||
data["file_name"] = result.file_name
|
||||
if result.file_mime:
|
||||
data["file_mime"] = result.file_mime
|
||||
|
||||
if result.success:
|
||||
return {"status": "ok", "data": data, "error": None}
|
||||
return {"status": "failed", "data": data, "error": {"message": result.error or "Tool failed"}}
|
||||
|
||||
|
||||
@app.get("/v1/models")
|
||||
async def list_available_models():
|
||||
"""List all available models across backends"""
|
||||
|
||||
@@ -124,6 +124,23 @@ llm_profiles:
|
||||
timeout_ms: 60000
|
||||
description: "Mistral Large для складних задач, reasoning, аналізу"
|
||||
|
||||
cloud_grok:
|
||||
provider: grok
|
||||
base_url: https://api.x.ai
|
||||
api_key_env: GROK_API_KEY
|
||||
model: grok-2-1212
|
||||
max_tokens: 2048
|
||||
temperature: 0.2
|
||||
timeout_ms: 60000
|
||||
description: "Grok API для SOFIIA (Chief AI Architect)"
|
||||
|
||||
# ============================================================================
|
||||
# Node Capabilities
|
||||
# ============================================================================
|
||||
node_capabilities:
|
||||
url: http://node-capabilities:8099/capabilities
|
||||
cache_ttl_sec: 30
|
||||
|
||||
# ============================================================================
|
||||
# Orchestrator Providers
|
||||
# ============================================================================
|
||||
@@ -417,8 +434,9 @@ agents:
|
||||
Розрізняй інших ботів за ніком та відповідай лише на стратегічні запити.
|
||||
|
||||
sofiia:
|
||||
description: "Sofiia — Chief AI Architect та Technical Sovereign"
|
||||
default_llm: local_default_coder
|
||||
description: "SOFIIA — Chief AI Architect & Technical Sovereign"
|
||||
default_llm: cloud_grok
|
||||
fallback_llm: local_default_coder
|
||||
system_prompt: |
|
||||
Ти Sofiia — Chief AI Architect та Technical Sovereign екосистеми DAARION.city.
|
||||
Працюй як CTO-помічник: архітектура, reliability, безпека, release governance, incident/risk/backlog контроль.
|
||||
|
||||
Reference in New Issue
Block a user