node2: fix Sofiia routing determinism + Node Capabilities Service
Bug fixes:
- Bug A: GROK_API_KEY env mismatch — router expected GROK_API_KEY but only
XAI_API_KEY was present. Added GROK_API_KEY=${XAI_API_KEY} alias in compose.
- Bug B: 'grok' profile missing in router-config.node2.yml — added cloud_grok
profile (provider: grok, model: grok-2-1212). Sofiia now has
default_llm=cloud_grok with fallback_llm=local_default_coder.
- Bug C: Router silently defaulted to cloud DeepSeek when profile was unknown.
Now falls back to agent.fallback_llm or local_default_coder with WARNING log.
Hardcoded Ollama URL (172.18.0.1) replaced with config-driven base_url.
New service: Node Capabilities Service (NCS)
- services/node-capabilities/ — FastAPI microservice exposing live model
inventory from Ollama, Swapper, and llama-server.
- GET /capabilities — canonical JSON with served_models[] and inventory_only[]
- GET /capabilities/models — flat list of served models
- POST /capabilities/refresh — force cache refresh
- Cache TTL 15s, bound to 127.0.0.1:8099
- services/router/capabilities_client.py — async client with TTL cache
Artifacts:
- ops/node2_models_audit.md — 3-layer model view (served/disk/cloud)
- ops/node2_models_audit.yml — machine-readable audit
- ops/node2_capabilities_example.json — sample NCS output (14 served models)
Made-with: Cursor
This commit is contained in:
125
ops/node2_models_audit.md
Normal file
125
ops/node2_models_audit.md
Normal file
@@ -0,0 +1,125 @@
|
||||
# NODA2 Model Audit — Three-Layer View
|
||||
**Date:** 2026-02-27
|
||||
**Node:** MacBook Pro M4 Max, 64GB unified memory
|
||||
|
||||
---
|
||||
|
||||
## Layer 1: Served by Runtime (routing-eligible)
|
||||
|
||||
These are models the router can actively select and invoke.
|
||||
|
||||
### Ollama (12 models, port 11434)
|
||||
|
||||
| Model | Type | Size | Status | Note |
|
||||
|-------|------|------|--------|------|
|
||||
| qwen3.5:35b-a3b | LLM (MoE) | 9.3 GB | idle | PRIMARY reasoning |
|
||||
| qwen3:14b | LLM | 9.3 GB | idle | Default local |
|
||||
| gemma3:latest | LLM | 3.3 GB | idle | Fast small |
|
||||
| glm-4.7-flash:32k | LLM | 19 GB | idle | Long-context |
|
||||
| glm-4.7-flash:q4_K_M | LLM | 19 GB | idle | **DUPLICATE** |
|
||||
| llava:13b | Vision | 8.0 GB | idle | P0 fallback |
|
||||
| mistral-nemo:12b | LLM | 7.1 GB | idle | old |
|
||||
| deepseek-coder:33b | Code | 18.8 GB | idle | Heavy code |
|
||||
| deepseek-r1:70b | LLM | 42.5 GB | idle | Very heavy reasoning |
|
||||
| starcoder2:3b | Code | 1.7 GB | idle | Fast code |
|
||||
| phi3:latest | LLM | 2.2 GB | idle | Small general |
|
||||
| gpt-oss:latest | LLM | 13.8 GB | idle | old |
|
||||
|
||||
### Swapper (port 8890)
|
||||
|
||||
| Model | Type | Status |
|
||||
|-------|------|--------|
|
||||
| llava-13b | Vision | unloaded |
|
||||
|
||||
### llama-server (port 11435)
|
||||
|
||||
| Model | Type | Note |
|
||||
|-------|------|------|
|
||||
| Qwen3.5-35B-A3B-Q4_K_M.gguf | LLM | **DUPLICATE** of Ollama |
|
||||
|
||||
### Cloud APIs
|
||||
|
||||
| Provider | Model | API Key | Active |
|
||||
|----------|-------|---------|--------|
|
||||
| Grok (xAI) | grok-2-1212 | `GROK_API_KEY` ✅ | **Sofiia primary** |
|
||||
| DeepSeek | deepseek-chat | `DEEPSEEK_API_KEY` ✅ | Other agents |
|
||||
| Mistral | mistral-large | `MISTRAL_API_KEY` | Not configured |
|
||||
|
||||
---
|
||||
|
||||
## Layer 2: Installed on Disk (not served)
|
||||
|
||||
These are on disk but NOT reachable by router/swapper.
|
||||
|
||||
| Model | Type | Size | Location | Status |
|
||||
|-------|------|------|----------|--------|
|
||||
| whisper-large-v3-turbo (MLX) | STT | 1.5 GB | HF cache | Ready, not integrated |
|
||||
| Kokoro-82M-bf16 (MLX) | TTS | 0.35 GB | HF cache | Ready, not integrated |
|
||||
| MiniCPM-V-4_5 | Vision | 16 GB | HF cache | Not serving |
|
||||
| Qwen3-VL-32B-Instruct | Vision | 123 GB | Cursor worktree | R&D artifact |
|
||||
| Jan-v2-VL-med-Q8_0 | Vision | 9.2 GB | Jan AI | Not running |
|
||||
| Qwen2.5-7B-Instruct | LLM | 14 GB | HF cache | Idle |
|
||||
| Qwen2.5-1.5B-Instruct | LLM | 2.9 GB | HF cache | Idle |
|
||||
| flux2-dev-Q8_0 | Image gen | 33 GB | ComfyUI | Offline |
|
||||
| ltx-2-19b-distilled | Video gen | 25 GB | ComfyUI | Offline |
|
||||
| SDXL-base-1.0 | Image gen | 72 GB | hf_models | Legacy |
|
||||
| FLUX.2-dev (Aquiles) | Image gen | 105 GB | HF cache | ComfyUI |
|
||||
|
||||
---
|
||||
|
||||
## Layer 3: Sofiia Routing (after fix)
|
||||
|
||||
### Before fix (broken)
|
||||
```
|
||||
agent_registry: llm_profile=grok
|
||||
→ router looks up "grok" in node2 config → NOT FOUND
|
||||
→ llm_profile = {} → provider defaults to "deepseek" (hardcoded)
|
||||
→ tries DEEPSEEK_API_KEY → may work (nondeterministic)
|
||||
→ XAI_API_KEY exists but mapped as "XAI_API_KEY", not "GROK_API_KEY"
|
||||
```
|
||||
|
||||
### After fix (deterministic)
|
||||
```
|
||||
agent_registry: llm_profile=grok
|
||||
router-config.node2.yml:
|
||||
agents.sofiia.default_llm = cloud_grok
|
||||
agents.sofiia.fallback_llm = local_default_coder
|
||||
llm_profiles.cloud_grok = {provider: grok, model: grok-2-1212, base_url: https://api.x.ai}
|
||||
|
||||
docker-compose: GROK_API_KEY=${XAI_API_KEY} (aliased)
|
||||
|
||||
Chain:
|
||||
1. Sofiia request → router resolves cloud_grok
|
||||
2. provider=grok → GROK_API_KEY present → xAI API → grok-2-1212
|
||||
3. If Grok fails → fallback_llm=local_default_coder → qwen3:14b (Ollama)
|
||||
4. If unknown profile → WARNING logged, uses agent.default_llm (local), NOT cloud silently
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Fixes Applied in This Commit
|
||||
|
||||
| Bug | Fix | File |
|
||||
|-----|-----|------|
|
||||
| A: GROK_API_KEY not in env | Added `GROK_API_KEY=${XAI_API_KEY}` | docker-compose.node2-sofiia.yml |
|
||||
| B: No `grok` profile | Added `cloud_grok` profile | router-config.node2.yml |
|
||||
| B: Sofiia → wrong profile | `agents.sofiia.default_llm = cloud_grok` | router-config.node2.yml |
|
||||
| C: Silent cloud fallback | Unknown profile → local default + WARNING | services/router/main.py |
|
||||
| C: Hardcoded Ollama URL | `172.18.0.1:11434` → dynamic from config | services/router/main.py |
|
||||
| — | Node Capabilities Service | services/node-capabilities/ |
|
||||
|
||||
---
|
||||
|
||||
## Node Capabilities Service
|
||||
|
||||
New microservice providing live model inventory at `GET /capabilities`:
|
||||
- Collects from Ollama, Swapper, llama-server
|
||||
- Returns canonical JSON with `served_models[]` and `inventory_only[]`
|
||||
- Cache TTL: 15s
|
||||
- Port: 127.0.0.1:8099
|
||||
|
||||
Verification:
|
||||
```bash
|
||||
curl -s http://localhost:8099/capabilities | jq '.served_models | length'
|
||||
# Expected: 14
|
||||
```
|
||||
Reference in New Issue
Block a user