P1: NCS-first model selection + NATS capabilities + Grok 4.1

Router model selection:
- New model_select.py: resolve_effective_profile → profile_requirements →
  select_best_model pipeline. NCS-first with graceful static fallback.
- selection_policies in router-config.node2.yml define prefer order per
  profile without hardcoding models (e.g. local_default_coder prefers
  qwen3:14b then qwen3.5:35b-a3b).
- Cloud profiles (cloud_grok, cloud_deepseek) skip NCS; on cloud failure
  use fallback_profile via NCS for local selection.
- Structured logs: selected_profile, required_type, runtime, model,
  caps_age_s, fallback_reason on every infer request.

Grok model fix:
- grok-2-1212 no longer exists on xAI API → updated to
  grok-4-1-fast-reasoning across all 3 hardcoded locations in main.py
  and router-config.node2.yml.

NCS NATS request/reply:
- node-capabilities subscribes to node.noda2.capabilities.get (NATS
  request/reply). Enabled via ENABLE_NATS_CAPS=true in compose.
- NODA1 router can query NODA2 capabilities over NATS leafnode without
  HTTP connectivity.

Verified:
- NCS: 14 served models from Ollama+Swapper+llama-server
- NATS: request/reply returns full capabilities JSON
- Sofiia: cloud_grok → grok-4-1-fast-reasoning (tested, 200 OK)
- Helion: NCS → qwen3:14b via Ollama (caps_age=23.7s cache hit)
- Router health: ok

Made-with: Cursor
This commit is contained in:
Apple
2026-02-27 02:17:34 -08:00
parent e2a3ae342a
commit 89c3f2ac66
6 changed files with 489 additions and 34 deletions

View File

@@ -128,11 +128,11 @@ llm_profiles:
provider: grok
base_url: https://api.x.ai
api_key_env: GROK_API_KEY
model: grok-2-1212
model: grok-4-1-fast-reasoning
max_tokens: 2048
temperature: 0.2
timeout_ms: 60000
description: "Grok API для SOFIIA (Chief AI Architect)"
description: "Grok 4.1 Fast Reasoning для SOFIIA (Chief AI Architect)"
# ============================================================================
# Node Capabilities
@@ -141,6 +141,72 @@ node_capabilities:
url: http://node-capabilities:8099/capabilities
cache_ttl_sec: 30
# ============================================================================
# Selection Policies (NCS-first model selection)
# ============================================================================
# Router uses these to map profile → required_type + prefer order.
# NCS picks the best served model matching these requirements.
# Cloud profiles skip NCS; if cloud fails, fallback_profile is used via NCS.
selection_policies:
local_default_coder:
required_type: llm
prefer: ["qwen3:14b", "qwen3.5:35b-a3b", "*"]
local_default_reasoner:
required_type: llm
prefer: ["qwen3.5:35b-a3b", "deepseek-r1:70b", "*"]
qwen3_strategist_8b:
required_type: llm
prefer: ["qwen3:14b", "qwen3.5:35b-a3b", "*"]
qwen3_support_8b:
required_type: llm
prefer: ["qwen3:14b", "gemma3:latest", "*"]
qwen3_science_8b:
required_type: llm
prefer: ["qwen3:14b", "qwen3.5:35b-a3b", "*"]
qwen3_creative_8b:
required_type: llm
prefer: ["qwen3:14b", "*"]
qwen3_5_35b_a3b:
required_type: llm
prefer: ["qwen3.5:35b-a3b", "*"]
qwen3_vision_8b:
required_type: vision
prefer: ["llava:13b", "*"]
qwen2_5_3b_service:
required_type: llm
prefer: ["phi3:latest", "gemma3:latest", "qwen3:14b"]
mistral_community_12b:
required_type: llm
prefer: ["mistral-nemo:12b", "qwen3:14b", "*"]
cloud_deepseek:
required_type: cloud_llm
provider: deepseek
fallback_profile: local_default_coder
cloud_grok:
required_type: cloud_llm
provider: grok
fallback_profile: local_default_coder
cloud_mistral:
required_type: cloud_llm
provider: mistral
fallback_profile: local_default_coder
vision_default:
required_type: vision
prefer: ["llava:13b", "*"]
# ============================================================================
# Orchestrator Providers
# ============================================================================