# NODA2 Model Audit — Three-Layer View **Date:** 2026-02-27 **Node:** MacBook Pro M4 Max, 64GB unified memory --- ## Layer 1: Served by Runtime (routing-eligible) These are models the router can actively select and invoke. ### Ollama (12 models, port 11434) | Model | Type | Size | Status | Note | |-------|------|------|--------|------| | qwen3.5:35b-a3b | LLM (MoE) | 9.3 GB | idle | PRIMARY reasoning | | qwen3:14b | LLM | 9.3 GB | idle | Default local | | gemma3:latest | LLM | 3.3 GB | idle | Fast small | | glm-4.7-flash:32k | LLM | 19 GB | idle | Long-context | | glm-4.7-flash:q4_K_M | LLM | 19 GB | idle | **DUPLICATE** | | llava:13b | Vision | 8.0 GB | idle | P0 fallback | | mistral-nemo:12b | LLM | 7.1 GB | idle | old | | deepseek-coder:33b | Code | 18.8 GB | idle | Heavy code | | deepseek-r1:70b | LLM | 42.5 GB | idle | Very heavy reasoning | | starcoder2:3b | Code | 1.7 GB | idle | Fast code | | phi3:latest | LLM | 2.2 GB | idle | Small general | | gpt-oss:latest | LLM | 13.8 GB | idle | old | ### Swapper (port 8890) | Model | Type | Status | |-------|------|--------| | llava-13b | Vision | unloaded | ### llama-server (port 11435) | Model | Type | Note | |-------|------|------| | Qwen3.5-35B-A3B-Q4_K_M.gguf | LLM | **DUPLICATE** of Ollama | ### Cloud APIs | Provider | Model | API Key | Active | |----------|-------|---------|--------| | Grok (xAI) | grok-2-1212 | `GROK_API_KEY` ✅ | **Sofiia primary** | | DeepSeek | deepseek-chat | `DEEPSEEK_API_KEY` ✅ | Other agents | | Mistral | mistral-large | `MISTRAL_API_KEY` | Not configured | --- ## Layer 2: Installed on Disk (not served) These are on disk but NOT reachable by router/swapper. | Model | Type | Size | Location | Status | |-------|------|------|----------|--------| | whisper-large-v3-turbo (MLX) | STT | 1.5 GB | HF cache | Ready, not integrated | | Kokoro-82M-bf16 (MLX) | TTS | 0.35 GB | HF cache | Ready, not integrated | | MiniCPM-V-4_5 | Vision | 16 GB | HF cache | Not serving | | Qwen3-VL-32B-Instruct | Vision | 123 GB | Cursor worktree | R&D artifact | | Jan-v2-VL-med-Q8_0 | Vision | 9.2 GB | Jan AI | Not running | | Qwen2.5-7B-Instruct | LLM | 14 GB | HF cache | Idle | | Qwen2.5-1.5B-Instruct | LLM | 2.9 GB | HF cache | Idle | | flux2-dev-Q8_0 | Image gen | 33 GB | ComfyUI | Offline | | ltx-2-19b-distilled | Video gen | 25 GB | ComfyUI | Offline | | SDXL-base-1.0 | Image gen | 72 GB | hf_models | Legacy | | FLUX.2-dev (Aquiles) | Image gen | 105 GB | HF cache | ComfyUI | --- ## Layer 3: Sofiia Routing (after fix) ### Before fix (broken) ``` agent_registry: llm_profile=grok → router looks up "grok" in node2 config → NOT FOUND → llm_profile = {} → provider defaults to "deepseek" (hardcoded) → tries DEEPSEEK_API_KEY → may work (nondeterministic) → XAI_API_KEY exists but mapped as "XAI_API_KEY", not "GROK_API_KEY" ``` ### After fix (deterministic) ``` agent_registry: llm_profile=grok router-config.node2.yml: agents.sofiia.default_llm = cloud_grok agents.sofiia.fallback_llm = local_default_coder llm_profiles.cloud_grok = {provider: grok, model: grok-2-1212, base_url: https://api.x.ai} docker-compose: GROK_API_KEY=${XAI_API_KEY} (aliased) Chain: 1. Sofiia request → router resolves cloud_grok 2. provider=grok → GROK_API_KEY present → xAI API → grok-2-1212 3. If Grok fails → fallback_llm=local_default_coder → qwen3:14b (Ollama) 4. If unknown profile → WARNING logged, uses agent.default_llm (local), NOT cloud silently ``` --- ## Fixes Applied in This Commit | Bug | Fix | File | |-----|-----|------| | A: GROK_API_KEY not in env | Added `GROK_API_KEY=${XAI_API_KEY}` | docker-compose.node2-sofiia.yml | | B: No `grok` profile | Added `cloud_grok` profile | router-config.node2.yml | | B: Sofiia → wrong profile | `agents.sofiia.default_llm = cloud_grok` | router-config.node2.yml | | C: Silent cloud fallback | Unknown profile → local default + WARNING | services/router/main.py | | C: Hardcoded Ollama URL | `172.18.0.1:11434` → dynamic from config | services/router/main.py | | — | Node Capabilities Service | services/node-capabilities/ | --- ## Node Capabilities Service New microservice providing live model inventory at `GET /capabilities`: - Collects from Ollama, Swapper, llama-server - Returns canonical JSON with `served_models[]` and `inventory_only[]` - Cache TTL: 15s - Port: 127.0.0.1:8099 Verification: ```bash curl -s http://localhost:8099/capabilities | jq '.served_models | length' # Expected: 14 ```