Router model selection:
- New model_select.py: resolve_effective_profile → profile_requirements →
select_best_model pipeline. NCS-first with graceful static fallback.
- selection_policies in router-config.node2.yml define prefer order per
profile without hardcoding models (e.g. local_default_coder prefers
qwen3:14b then qwen3.5:35b-a3b).
- Cloud profiles (cloud_grok, cloud_deepseek) skip NCS; on cloud failure
use fallback_profile via NCS for local selection.
- Structured logs: selected_profile, required_type, runtime, model,
caps_age_s, fallback_reason on every infer request.
Grok model fix:
- grok-2-1212 no longer exists on xAI API → updated to
grok-4-1-fast-reasoning across all 3 hardcoded locations in main.py
and router-config.node2.yml.
NCS NATS request/reply:
- node-capabilities subscribes to node.noda2.capabilities.get (NATS
request/reply). Enabled via ENABLE_NATS_CAPS=true in compose.
- NODA1 router can query NODA2 capabilities over NATS leafnode without
HTTP connectivity.
Verified:
- NCS: 14 served models from Ollama+Swapper+llama-server
- NATS: request/reply returns full capabilities JSON
- Sofiia: cloud_grok → grok-4-1-fast-reasoning (tested, 200 OK)
- Helion: NCS → qwen3:14b via Ollama (caps_age=23.7s cache hit)
- Router health: ok
Made-with: Cursor