Files
microdao-daarion/services/router/capabilities_client.py
Apple e2a3ae342a node2: fix Sofiia routing determinism + Node Capabilities Service
Bug fixes:
- Bug A: GROK_API_KEY env mismatch — router expected GROK_API_KEY but only
  XAI_API_KEY was present. Added GROK_API_KEY=${XAI_API_KEY} alias in compose.
- Bug B: 'grok' profile missing in router-config.node2.yml — added cloud_grok
  profile (provider: grok, model: grok-2-1212). Sofiia now has
  default_llm=cloud_grok with fallback_llm=local_default_coder.
- Bug C: Router silently defaulted to cloud DeepSeek when profile was unknown.
  Now falls back to agent.fallback_llm or local_default_coder with WARNING log.
  Hardcoded Ollama URL (172.18.0.1) replaced with config-driven base_url.

New service: Node Capabilities Service (NCS)
- services/node-capabilities/ — FastAPI microservice exposing live model
  inventory from Ollama, Swapper, and llama-server.
- GET /capabilities — canonical JSON with served_models[] and inventory_only[]
- GET /capabilities/models — flat list of served models
- POST /capabilities/refresh — force cache refresh
- Cache TTL 15s, bound to 127.0.0.1:8099
- services/router/capabilities_client.py — async client with TTL cache

Artifacts:
- ops/node2_models_audit.md — 3-layer model view (served/disk/cloud)
- ops/node2_models_audit.yml — machine-readable audit
- ops/node2_capabilities_example.json — sample NCS output (14 served models)

Made-with: Cursor
2026-02-27 02:07:40 -08:00

81 lines
2.2 KiB
Python

"""Capabilities client — fetches and caches live model inventory from Node Capabilities Service."""
import os
import time
import logging
from typing import Any, Dict, List, Optional
import httpx
logger = logging.getLogger("capabilities_client")
_cache: Dict[str, Any] = {}
_cache_ts: float = 0
NODE_CAPABILITIES_URL = os.getenv("NODE_CAPABILITIES_URL", "")
CACHE_TTL = 30
def configure(url: str = "", ttl: int = 30):
global NODE_CAPABILITIES_URL, CACHE_TTL
if url:
NODE_CAPABILITIES_URL = url
CACHE_TTL = ttl
async def fetch_capabilities(force: bool = False) -> Dict[str, Any]:
global _cache, _cache_ts
if not NODE_CAPABILITIES_URL:
return {}
if not force and _cache and (time.time() - _cache_ts) < CACHE_TTL:
return _cache
try:
async with httpx.AsyncClient(timeout=5) as c:
resp = await c.get(NODE_CAPABILITIES_URL)
if resp.status_code == 200:
_cache = resp.json()
_cache_ts = time.time()
logger.info(f"Capabilities refreshed: {_cache.get('served_count', 0)} served models")
return _cache
else:
logger.warning(f"Capabilities fetch failed: HTTP {resp.status_code}")
except Exception as e:
logger.warning(f"Capabilities fetch error: {e}")
return _cache
def get_cached() -> Dict[str, Any]:
return _cache
def find_served_model(
model_type: str = "llm",
preferred_name: Optional[str] = None,
runtime: Optional[str] = None,
) -> Optional[Dict[str, Any]]:
"""Find best served model matching criteria from cached capabilities."""
served = _cache.get("served_models", [])
if not served:
return None
candidates = [m for m in served if m.get("type") == model_type]
if runtime:
candidates = [m for m in candidates if m.get("runtime") == runtime]
if not candidates:
return None
if preferred_name:
for m in candidates:
if preferred_name in m.get("name", ""):
return m
return candidates[0]
def list_served_by_type(model_type: str = "llm") -> List[Dict[str, Any]]:
return [m for m in _cache.get("served_models", []) if m.get("type") == model_type]