# Agent Fabric Layer — Changelog ## v0.4 — P3.2/P3.3/P3.4 Multi-node Deploy + Auth + Prometheus (2026-02-27) ### P3.2 — NCS + Node Worker on NODA1 - Added `node-worker` service to `docker-compose.node1.yml` (NODE_ID=noda1) - NCS on NODA1 now has `NODE_WORKER_URL` for metrics collection - Fixed NODE_ID consistency: router on NODA1 now uses `noda1` (was `node-1-hetzner-gex44`) - Global pool will show 2 nodes after NODA1 deployment ### P3.3 — NATS Accounts/Auth Config - Created `config/nats-server.conf` with 3 accounts: SYS, FABRIC, APP - FABRIC account: per-user permissions for router, ncs, node_worker - Leafnode listener on :7422 with auth - Opt-in: not yet active (requires credential setup + client changes) ### P3.4 — Prometheus Counters - **Router** (`/fabric_metrics`): - `fabric_caps_refresh_total{status}`, `fabric_caps_stale_total` - `fabric_model_select_total{chosen_node,chosen_runtime,type}` - `fabric_offload_total{status,node,type}` - `fabric_breaker_state{node,type}` (gauge) - `fabric_score_ms` (histogram: 100-10000ms buckets) - **Node Worker** (`/prom_metrics`): - `node_worker_jobs_total{type,status}` - `node_worker_inflight` (gauge) - `node_worker_latency_ms{type,model}` (histogram) - **NCS** (`/prom_metrics`): - `ncs_runtime_health{runtime}` (gauge) - `ncs_runtime_p50_ms{runtime}`, `ncs_runtime_p95_ms{runtime}` - `ncs_node_wait_ms` --- ## v0.3 — P3.1 GPU/Queue-aware Routing (2026-02-27) ### NCS (Node Capabilities Service) - **NEW** `metrics.py` module: NodeLoad + RuntimeLoad collection - Capabilities payload now includes `node_load` and `runtime_load` - `node_load`: inflight_jobs, queue_depth, concurrency_limit, estimated_wait_ms, cpu_load_1m, mem_pressure - `runtime_load`: per-runtime healthy status, p50_ms, p95_ms from rolling window - **NEW** `POST /capabilities/report_latency` — accepts latency reports from node-worker - NCS fetches worker metrics via `NODE_WORKER_URL` env ### Node Worker - **NEW** `GET /metrics` endpoint: inflight_jobs, concurrency_limit, last_latencies_llm/vision - Latency tracking: rolling buffer of last 50 latencies per type - Fire-and-forget latency reporting to NCS after each successful job ### Router (model_select v3) - **NEW** `score_candidate()` function: wait + model_latency + cross_penalty + prefer_bonus - Selection uses scoring instead of simple local-first ordering - `LOCAL_THRESHOLD_MS = 250`: prefer local if within threshold of remote - `ModelSelection.score` field added - Structured log format: `[score] agent=X type=Y chosen=LOCAL:node/model score=N` ### Tests - 12 scoring tests (local wins, remote wins, exclude, breaker, type filter, prefer list, cross penalty, wait, threshold) - 7 NCS metrics tests (latency stats, cpu load, mem pressure, node load, runtime load) ### No Breaking Changes - JobRequest/JobResponse envelope unchanged - Existing capabilities fields preserved - All new fields are optional/additive --- ## v0.2 — P2.2+P2.3 NATS Offload (2026-02-26) - Node Worker service (NATS offload executor) - offload_client.py (circuit breaker, retries, deadline) - model_select with exclude_nodes + force_local - Router /infer remote offload path ## v0.1 — P2 Global Capabilities (2026-02-26) - Node Capabilities Service (NCS) on each node - global_capabilities_client.py (NATS scatter-gather discovery) - model_select v2 (multi-node aware) - NATS wildcard discovery: node.*.capabilities.get ## v0.0 — P1 NCS-first Selection (2026-02-26) - capabilities_client.py (single-node HTTP) - model_select v1 (profile → NCS → static fallback) - Grok API integration fix