GET /api/runbooks/status returns docs_root, indexed_files, indexed_chunks, last_indexed_at, fts_available; docs_index_meta table and set on rebuild
Made-with: Cursor
includes preflight, restart, smoke, observation, evidence steps
defines success criteria and metrics to collect for next-step decision
Made-with: Cursor
Version cursor payloads and keep backward compatibility while adding dedicated tie-breaker regression coverage for equal timestamps to prevent pagination duplicates and gaps.
Made-with: Cursor
Move idempotency TTL/LRU logic into a dedicated store module with a swap-ready interface and wire chat send flow to use store get/set semantics without changing API behavior.
Made-with: Cursor
Add regression coverage for router URL resolution when NODE_ID is unset and ROUTER_URL is present, and verify explicit NODES_NODA2_ROUTER_URL keeps higher priority.
Made-with: Cursor
Expose Prometheus-style metrics endpoint and add counters for send requests, idempotency replays, and cursor pagination calls, including a safe in-process fallback exposition when prometheus_client is unavailable.
Made-with: Cursor
Add BFF runtime support for chat idempotency (header priority over body) with bounded in-memory TTL/LRU replay cache, implement cursor-based pagination for chats and messages, and add a safe NODA2 local router fallback for legacy runs without NODE_ID.
Made-with: Cursor
Add focused API contract tests for chat idempotency, cursor pagination, and node routing behavior using isolated local fixtures and mocked upstream inference.
Made-with: Cursor
- Node Worker: replace swapper_vision with ollama_vision (direct Ollama API)
- Node Worker: add NATS subjects for stt/tts/image (stubs ready)
- Node Worker: remove SWAPPER_URL dependency from config
- Router: vision calls go directly to Ollama /api/generate with images
- Router: local LLM calls go directly to Ollama /api/generate
- Router: add OLLAMA_URL and PREFER_NODE_WORKER=true feature flag
- Router: /v1/models now uses NCS global capabilities pool
- NCS: SWAPPER_URL="" -> skip Swapper probing (status=disabled)
- Swapper configs: remove all hardcoded model lists, keep only runtime
URLs, timeouts, limits
- docker-compose.node1.yml: add OLLAMA_URL, PREFER_NODE_WORKER for router;
SWAPPER_URL= for NCS; remove swapper-service from node-worker depends_on
- docker-compose.node2-sofiia.yml: same changes for NODA2
Swapper service still runs but is NOT in the critical inference path.
Source of truth for models is now NCS -> Ollama /api/tags.
Made-with: Cursor
NATS wildcards (node.*.capabilities.get) only work for subscriptions,
not for publish. Switch to a dedicated broadcast subject
(fabric.capabilities.discover) that all NCS instances subscribe to,
enabling proper scatter-gather discovery across nodes.
Made-with: Cursor
Architecture for 150+ nodes:
- global_capabilities_client.py: NATS scatter-gather discovery using
wildcard subject node.*.capabilities.get — zero static node lists.
New nodes auto-register by deploying NCS and subscribing to NATS.
Dead nodes expire from cache after 3x TTL automatically.
Multi-node model_select.py:
- ModelSelection now includes node, local, via_nats fields
- select_best_model prefers local candidates, then remote
- Prefer list resolution: local first, remote second
- All logged per request: node, runtime, model, local/remote
NODA1 compose:
- Added node-capabilities service (NCS) to docker-compose.node1.yml
- NATS subscription: node.noda1.capabilities.get
- Router env: NODE_CAPABILITIES_URL + ENABLE_GLOBAL_CAPS_NATS=true
NODA2 compose:
- Router env: ENABLE_GLOBAL_CAPS_NATS=true
Router main.py:
- Startup: initializes global_capabilities_client (NATS connect + first
discovery). Falls back to local-only capabilities_client if unavailable.
- /infer: uses get_global_capabilities() for cross-node model pool
- Offload support: send_offload_request(node_id, type, payload) via NATS
Verified on NODA2:
- Global caps: 1 node, 14 models (NODA1 not yet deployed)
- Sofiia: cloud_grok → grok-4-1-fast-reasoning (OK)
- Helion: NCS → qwen3:14b local (OK)
- When NODA1 deploys NCS, its models appear automatically via NATS discovery
Made-with: Cursor
Router model selection:
- New model_select.py: resolve_effective_profile → profile_requirements →
select_best_model pipeline. NCS-first with graceful static fallback.
- selection_policies in router-config.node2.yml define prefer order per
profile without hardcoding models (e.g. local_default_coder prefers
qwen3:14b then qwen3.5:35b-a3b).
- Cloud profiles (cloud_grok, cloud_deepseek) skip NCS; on cloud failure
use fallback_profile via NCS for local selection.
- Structured logs: selected_profile, required_type, runtime, model,
caps_age_s, fallback_reason on every infer request.
Grok model fix:
- grok-2-1212 no longer exists on xAI API → updated to
grok-4-1-fast-reasoning across all 3 hardcoded locations in main.py
and router-config.node2.yml.
NCS NATS request/reply:
- node-capabilities subscribes to node.noda2.capabilities.get (NATS
request/reply). Enabled via ENABLE_NATS_CAPS=true in compose.
- NODA1 router can query NODA2 capabilities over NATS leafnode without
HTTP connectivity.
Verified:
- NCS: 14 served models from Ollama+Swapper+llama-server
- NATS: request/reply returns full capabilities JSON
- Sofiia: cloud_grok → grok-4-1-fast-reasoning (tested, 200 OK)
- Helion: NCS → qwen3:14b via Ollama (caps_age=23.7s cache hit)
- Router health: ok
Made-with: Cursor