P3.2 — Multi-node deployment: - Added node-worker service to docker-compose.node1.yml (NODE_ID=noda1) - NCS NODA1 now has NODE_WORKER_URL for metrics collection - Fixed NODE_ID consistency: router NODA1 uses 'noda1' - NODA2 node-worker/NCS gets NCS_REPORT_URL for latency reporting P3.3 — NATS accounts/auth (opt-in config): - config/nats-server.conf with 3 accounts: SYS, FABRIC, APP - Per-user topic permissions (router, ncs, node_worker) - Leafnode listener :7422 with auth - Not yet activated (requires credential provisioning) P3.4 — Prometheus counters: - Router /fabric_metrics: caps_refresh, caps_stale, model_select, offload_total, breaker_state, score_ms histogram - Node Worker /prom_metrics: jobs_total, inflight gauge, latency_ms histogram - NCS /prom_metrics: runtime_health, runtime_p50/p95, node_wait_ms - All bound to 127.0.0.1 (not externally exposed) Made-with: Cursor
87 lines
3.5 KiB
Markdown
87 lines
3.5 KiB
Markdown
# Agent Fabric Layer — Changelog
|
|
|
|
## v0.4 — P3.2/P3.3/P3.4 Multi-node Deploy + Auth + Prometheus (2026-02-27)
|
|
|
|
### P3.2 — NCS + Node Worker on NODA1
|
|
- Added `node-worker` service to `docker-compose.node1.yml` (NODE_ID=noda1)
|
|
- NCS on NODA1 now has `NODE_WORKER_URL` for metrics collection
|
|
- Fixed NODE_ID consistency: router on NODA1 now uses `noda1` (was `node-1-hetzner-gex44`)
|
|
- Global pool will show 2 nodes after NODA1 deployment
|
|
|
|
### P3.3 — NATS Accounts/Auth Config
|
|
- Created `config/nats-server.conf` with 3 accounts: SYS, FABRIC, APP
|
|
- FABRIC account: per-user permissions for router, ncs, node_worker
|
|
- Leafnode listener on :7422 with auth
|
|
- Opt-in: not yet active (requires credential setup + client changes)
|
|
|
|
### P3.4 — Prometheus Counters
|
|
- **Router** (`/fabric_metrics`):
|
|
- `fabric_caps_refresh_total{status}`, `fabric_caps_stale_total`
|
|
- `fabric_model_select_total{chosen_node,chosen_runtime,type}`
|
|
- `fabric_offload_total{status,node,type}`
|
|
- `fabric_breaker_state{node,type}` (gauge)
|
|
- `fabric_score_ms` (histogram: 100-10000ms buckets)
|
|
- **Node Worker** (`/prom_metrics`):
|
|
- `node_worker_jobs_total{type,status}`
|
|
- `node_worker_inflight` (gauge)
|
|
- `node_worker_latency_ms{type,model}` (histogram)
|
|
- **NCS** (`/prom_metrics`):
|
|
- `ncs_runtime_health{runtime}` (gauge)
|
|
- `ncs_runtime_p50_ms{runtime}`, `ncs_runtime_p95_ms{runtime}`
|
|
- `ncs_node_wait_ms`
|
|
|
|
---
|
|
|
|
## v0.3 — P3.1 GPU/Queue-aware Routing (2026-02-27)
|
|
|
|
### NCS (Node Capabilities Service)
|
|
- **NEW** `metrics.py` module: NodeLoad + RuntimeLoad collection
|
|
- Capabilities payload now includes `node_load` and `runtime_load`
|
|
- `node_load`: inflight_jobs, queue_depth, concurrency_limit, estimated_wait_ms, cpu_load_1m, mem_pressure
|
|
- `runtime_load`: per-runtime healthy status, p50_ms, p95_ms from rolling window
|
|
- **NEW** `POST /capabilities/report_latency` — accepts latency reports from node-worker
|
|
- NCS fetches worker metrics via `NODE_WORKER_URL` env
|
|
|
|
### Node Worker
|
|
- **NEW** `GET /metrics` endpoint: inflight_jobs, concurrency_limit, last_latencies_llm/vision
|
|
- Latency tracking: rolling buffer of last 50 latencies per type
|
|
- Fire-and-forget latency reporting to NCS after each successful job
|
|
|
|
### Router (model_select v3)
|
|
- **NEW** `score_candidate()` function: wait + model_latency + cross_penalty + prefer_bonus
|
|
- Selection uses scoring instead of simple local-first ordering
|
|
- `LOCAL_THRESHOLD_MS = 250`: prefer local if within threshold of remote
|
|
- `ModelSelection.score` field added
|
|
- Structured log format: `[score] agent=X type=Y chosen=LOCAL:node/model score=N`
|
|
|
|
### Tests
|
|
- 12 scoring tests (local wins, remote wins, exclude, breaker, type filter, prefer list, cross penalty, wait, threshold)
|
|
- 7 NCS metrics tests (latency stats, cpu load, mem pressure, node load, runtime load)
|
|
|
|
### No Breaking Changes
|
|
- JobRequest/JobResponse envelope unchanged
|
|
- Existing capabilities fields preserved
|
|
- All new fields are optional/additive
|
|
|
|
---
|
|
|
|
## v0.2 — P2.2+P2.3 NATS Offload (2026-02-26)
|
|
|
|
- Node Worker service (NATS offload executor)
|
|
- offload_client.py (circuit breaker, retries, deadline)
|
|
- model_select with exclude_nodes + force_local
|
|
- Router /infer remote offload path
|
|
|
|
## v0.1 — P2 Global Capabilities (2026-02-26)
|
|
|
|
- Node Capabilities Service (NCS) on each node
|
|
- global_capabilities_client.py (NATS scatter-gather discovery)
|
|
- model_select v2 (multi-node aware)
|
|
- NATS wildcard discovery: node.*.capabilities.get
|
|
|
|
## v0.0 — P1 NCS-first Selection (2026-02-26)
|
|
|
|
- capabilities_client.py (single-node HTTP)
|
|
- model_select v1 (profile → NCS → static fallback)
|
|
- Grok API integration fix
|