Files

Apple e2a3ae342a node2: fix Sofiia routing determinism + Node Capabilities Service

Bug fixes:
- Bug A: GROK_API_KEY env mismatch — router expected GROK_API_KEY but only
  XAI_API_KEY was present. Added GROK_API_KEY=${XAI_API_KEY} alias in compose.
- Bug B: 'grok' profile missing in router-config.node2.yml — added cloud_grok
  profile (provider: grok, model: grok-2-1212). Sofiia now has
  default_llm=cloud_grok with fallback_llm=local_default_coder.
- Bug C: Router silently defaulted to cloud DeepSeek when profile was unknown.
  Now falls back to agent.fallback_llm or local_default_coder with WARNING log.
  Hardcoded Ollama URL (172.18.0.1) replaced with config-driven base_url.

New service: Node Capabilities Service (NCS)
- services/node-capabilities/ — FastAPI microservice exposing live model
  inventory from Ollama, Swapper, and llama-server.
- GET /capabilities — canonical JSON with served_models[] and inventory_only[]
- GET /capabilities/models — flat list of served models
- POST /capabilities/refresh — force cache refresh
- Cache TTL 15s, bound to 127.0.0.1:8099
- services/router/capabilities_client.py — async client with TTL cache

Artifacts:
- ops/node2_models_audit.md — 3-layer model view (served/disk/cloud)
- ops/node2_models_audit.yml — machine-readable audit
- ops/node2_capabilities_example.json — sample NCS output (14 served models)

Made-with: Cursor

2026-02-27 02:07:40 -08:00

4.4 KiB

Raw Blame History

NODA2 Model Audit — Three-Layer View

Date: 2026-02-27
Node: MacBook Pro M4 Max, 64GB unified memory

Layer 1: Served by Runtime (routing-eligible)

These are models the router can actively select and invoke.

Ollama (12 models, port 11434)

Model	Type	Size	Status	Note
qwen3.5:35b-a3b	LLM (MoE)	9.3 GB	idle	PRIMARY reasoning
qwen3:14b	LLM	9.3 GB	idle	Default local
gemma3:latest	LLM	3.3 GB	idle	Fast small
glm-4.7-flash:32k	LLM	19 GB	idle	Long-context
glm-4.7-flash:q4_K_M	LLM	19 GB	idle	DUPLICATE
llava:13b	Vision	8.0 GB	idle	P0 fallback
mistral-nemo:12b	LLM	7.1 GB	idle	old
deepseek-coder:33b	Code	18.8 GB	idle	Heavy code
deepseek-r1:70b	LLM	42.5 GB	idle	Very heavy reasoning
starcoder2:3b	Code	1.7 GB	idle	Fast code
phi3:latest	LLM	2.2 GB	idle	Small general
gpt-oss:latest	LLM	13.8 GB	idle	old

Swapper (port 8890)

Model	Type	Status
llava-13b	Vision	unloaded

llama-server (port 11435)

Model	Type	Note
Qwen3.5-35B-A3B-Q4_K_M.gguf	LLM	DUPLICATE of Ollama

Cloud APIs

Provider	Model	API Key	Active
Grok (xAI)	grok-2-1212	`GROK_API_KEY` ✅	Sofiia primary
DeepSeek	deepseek-chat	`DEEPSEEK_API_KEY` ✅	Other agents
Mistral	mistral-large	`MISTRAL_API_KEY`	Not configured

Layer 2: Installed on Disk (not served)

These are on disk but NOT reachable by router/swapper.

Model	Type	Size	Location	Status
whisper-large-v3-turbo (MLX)	STT	1.5 GB	HF cache	Ready, not integrated
Kokoro-82M-bf16 (MLX)	TTS	0.35 GB	HF cache	Ready, not integrated
MiniCPM-V-4_5	Vision	16 GB	HF cache	Not serving
Qwen3-VL-32B-Instruct	Vision	123 GB	Cursor worktree	R&D artifact
Jan-v2-VL-med-Q8_0	Vision	9.2 GB	Jan AI	Not running
Qwen2.5-7B-Instruct	LLM	14 GB	HF cache	Idle
Qwen2.5-1.5B-Instruct	LLM	2.9 GB	HF cache	Idle
flux2-dev-Q8_0	Image gen	33 GB	ComfyUI	Offline
ltx-2-19b-distilled	Video gen	25 GB	ComfyUI	Offline
SDXL-base-1.0	Image gen	72 GB	hf_models	Legacy
FLUX.2-dev (Aquiles)	Image gen	105 GB	HF cache	ComfyUI

Layer 3: Sofiia Routing (after fix)

Before fix (broken)

agent_registry: llm_profile=grok
→ router looks up "grok" in node2 config → NOT FOUND
→ llm_profile = {} → provider defaults to "deepseek" (hardcoded)
→ tries DEEPSEEK_API_KEY → may work (nondeterministic)
→ XAI_API_KEY exists but mapped as "XAI_API_KEY", not "GROK_API_KEY"

After fix (deterministic)

agent_registry: llm_profile=grok
router-config.node2.yml:
  agents.sofiia.default_llm = cloud_grok
  agents.sofiia.fallback_llm = local_default_coder
  llm_profiles.cloud_grok = {provider: grok, model: grok-2-1212, base_url: https://api.x.ai}

docker-compose: GROK_API_KEY=${XAI_API_KEY} (aliased)

Chain:
  1. Sofiia request → router resolves cloud_grok
  2. provider=grok → GROK_API_KEY present → xAI API → grok-2-1212
  3. If Grok fails → fallback_llm=local_default_coder → qwen3:14b (Ollama)
  4. If unknown profile → WARNING logged, uses agent.default_llm (local), NOT cloud silently

Fixes Applied in This Commit

Bug	Fix	File
A: GROK_API_KEY not in env	Added `GROK_API_KEY=${XAI_API_KEY}`	docker-compose.node2-sofiia.yml
B: No `grok` profile	Added `cloud_grok` profile	router-config.node2.yml
B: Sofiia → wrong profile	`agents.sofiia.default_llm = cloud_grok`	router-config.node2.yml
C: Silent cloud fallback	Unknown profile → local default + WARNING	services/router/main.py
C: Hardcoded Ollama URL	`172.18.0.1:11434` → dynamic from config	services/router/main.py
—	Node Capabilities Service	services/node-capabilities/

Node Capabilities Service

New microservice providing live model inventory at GET /capabilities:

Collects from Ollama, Swapper, llama-server
Returns canonical JSON with served_models[] and inventory_only[]
Cache TTL: 15s
Port: 127.0.0.1:8099

Verification:

curl -s http://localhost:8099/capabilities | jq '.served_models | length'
# Expected: 14

4.4 KiB Raw Blame History