Router model selection:
- New model_select.py: resolve_effective_profile → profile_requirements →
select_best_model pipeline. NCS-first with graceful static fallback.
- selection_policies in router-config.node2.yml define prefer order per
profile without hardcoding models (e.g. local_default_coder prefers
qwen3:14b then qwen3.5:35b-a3b).
- Cloud profiles (cloud_grok, cloud_deepseek) skip NCS; on cloud failure
use fallback_profile via NCS for local selection.
- Structured logs: selected_profile, required_type, runtime, model,
caps_age_s, fallback_reason on every infer request.
Grok model fix:
- grok-2-1212 no longer exists on xAI API → updated to
grok-4-1-fast-reasoning across all 3 hardcoded locations in main.py
and router-config.node2.yml.
NCS NATS request/reply:
- node-capabilities subscribes to node.noda2.capabilities.get (NATS
request/reply). Enabled via ENABLE_NATS_CAPS=true in compose.
- NODA1 router can query NODA2 capabilities over NATS leafnode without
HTTP connectivity.
Verified:
- NCS: 14 served models from Ollama+Swapper+llama-server
- NATS: request/reply returns full capabilities JSON
- Sofiia: cloud_grok → grok-4-1-fast-reasoning (tested, 200 OK)
- Helion: NCS → qwen3:14b via Ollama (caps_age=23.7s cache hit)
- Router health: ok
Made-with: Cursor
Bug fixes:
- Bug A: GROK_API_KEY env mismatch — router expected GROK_API_KEY but only
XAI_API_KEY was present. Added GROK_API_KEY=${XAI_API_KEY} alias in compose.
- Bug B: 'grok' profile missing in router-config.node2.yml — added cloud_grok
profile (provider: grok, model: grok-2-1212). Sofiia now has
default_llm=cloud_grok with fallback_llm=local_default_coder.
- Bug C: Router silently defaulted to cloud DeepSeek when profile was unknown.
Now falls back to agent.fallback_llm or local_default_coder with WARNING log.
Hardcoded Ollama URL (172.18.0.1) replaced with config-driven base_url.
New service: Node Capabilities Service (NCS)
- services/node-capabilities/ — FastAPI microservice exposing live model
inventory from Ollama, Swapper, and llama-server.
- GET /capabilities — canonical JSON with served_models[] and inventory_only[]
- GET /capabilities/models — flat list of served models
- POST /capabilities/refresh — force cache refresh
- Cache TTL 15s, bound to 127.0.0.1:8099
- services/router/capabilities_client.py — async client with TTL cache
Artifacts:
- ops/node2_models_audit.md — 3-layer model view (served/disk/cloud)
- ops/node2_models_audit.yml — machine-readable audit
- ops/node2_capabilities_example.json — sample NCS output (14 served models)
Made-with: Cursor