P1: NCS-first model selection + NATS capabilities + Grok 4.1
Router model selection: - New model_select.py: resolve_effective_profile → profile_requirements → select_best_model pipeline. NCS-first with graceful static fallback. - selection_policies in router-config.node2.yml define prefer order per profile without hardcoding models (e.g. local_default_coder prefers qwen3:14b then qwen3.5:35b-a3b). - Cloud profiles (cloud_grok, cloud_deepseek) skip NCS; on cloud failure use fallback_profile via NCS for local selection. - Structured logs: selected_profile, required_type, runtime, model, caps_age_s, fallback_reason on every infer request. Grok model fix: - grok-2-1212 no longer exists on xAI API → updated to grok-4-1-fast-reasoning across all 3 hardcoded locations in main.py and router-config.node2.yml. NCS NATS request/reply: - node-capabilities subscribes to node.noda2.capabilities.get (NATS request/reply). Enabled via ENABLE_NATS_CAPS=true in compose. - NODA1 router can query NODA2 capabilities over NATS leafnode without HTTP connectivity. Verified: - NCS: 14 served models from Ollama+Swapper+llama-server - NATS: request/reply returns full capabilities JSON - Sofiia: cloud_grok → grok-4-1-fast-reasoning (tested, 200 OK) - Helion: NCS → qwen3:14b via Ollama (caps_age=23.7s cache hit) - Router health: ok Made-with: Cursor
This commit is contained in:
@@ -128,11 +128,11 @@ llm_profiles:
|
||||
provider: grok
|
||||
base_url: https://api.x.ai
|
||||
api_key_env: GROK_API_KEY
|
||||
model: grok-2-1212
|
||||
model: grok-4-1-fast-reasoning
|
||||
max_tokens: 2048
|
||||
temperature: 0.2
|
||||
timeout_ms: 60000
|
||||
description: "Grok API для SOFIIA (Chief AI Architect)"
|
||||
description: "Grok 4.1 Fast Reasoning для SOFIIA (Chief AI Architect)"
|
||||
|
||||
# ============================================================================
|
||||
# Node Capabilities
|
||||
@@ -141,6 +141,72 @@ node_capabilities:
|
||||
url: http://node-capabilities:8099/capabilities
|
||||
cache_ttl_sec: 30
|
||||
|
||||
# ============================================================================
|
||||
# Selection Policies (NCS-first model selection)
|
||||
# ============================================================================
|
||||
# Router uses these to map profile → required_type + prefer order.
|
||||
# NCS picks the best served model matching these requirements.
|
||||
# Cloud profiles skip NCS; if cloud fails, fallback_profile is used via NCS.
|
||||
selection_policies:
|
||||
local_default_coder:
|
||||
required_type: llm
|
||||
prefer: ["qwen3:14b", "qwen3.5:35b-a3b", "*"]
|
||||
|
||||
local_default_reasoner:
|
||||
required_type: llm
|
||||
prefer: ["qwen3.5:35b-a3b", "deepseek-r1:70b", "*"]
|
||||
|
||||
qwen3_strategist_8b:
|
||||
required_type: llm
|
||||
prefer: ["qwen3:14b", "qwen3.5:35b-a3b", "*"]
|
||||
|
||||
qwen3_support_8b:
|
||||
required_type: llm
|
||||
prefer: ["qwen3:14b", "gemma3:latest", "*"]
|
||||
|
||||
qwen3_science_8b:
|
||||
required_type: llm
|
||||
prefer: ["qwen3:14b", "qwen3.5:35b-a3b", "*"]
|
||||
|
||||
qwen3_creative_8b:
|
||||
required_type: llm
|
||||
prefer: ["qwen3:14b", "*"]
|
||||
|
||||
qwen3_5_35b_a3b:
|
||||
required_type: llm
|
||||
prefer: ["qwen3.5:35b-a3b", "*"]
|
||||
|
||||
qwen3_vision_8b:
|
||||
required_type: vision
|
||||
prefer: ["llava:13b", "*"]
|
||||
|
||||
qwen2_5_3b_service:
|
||||
required_type: llm
|
||||
prefer: ["phi3:latest", "gemma3:latest", "qwen3:14b"]
|
||||
|
||||
mistral_community_12b:
|
||||
required_type: llm
|
||||
prefer: ["mistral-nemo:12b", "qwen3:14b", "*"]
|
||||
|
||||
cloud_deepseek:
|
||||
required_type: cloud_llm
|
||||
provider: deepseek
|
||||
fallback_profile: local_default_coder
|
||||
|
||||
cloud_grok:
|
||||
required_type: cloud_llm
|
||||
provider: grok
|
||||
fallback_profile: local_default_coder
|
||||
|
||||
cloud_mistral:
|
||||
required_type: cloud_llm
|
||||
provider: mistral
|
||||
fallback_profile: local_default_coder
|
||||
|
||||
vision_default:
|
||||
required_type: vision
|
||||
prefer: ["llava:13b", "*"]
|
||||
|
||||
# ============================================================================
|
||||
# Orchestrator Providers
|
||||
# ============================================================================
|
||||
|
||||
Reference in New Issue
Block a user