NCS (services/node-capabilities/metrics.py): - NodeLoad: inflight_jobs, queue_depth, concurrency_limit, estimated_wait_ms, cpu_load_1m, mem_pressure (macOS + Linux), rtt_ms_to_hub - RuntimeLoad: per-runtime healthy, p50_ms, p95_ms from rolling 50-sample window - POST /capabilities/report_latency for node-worker → NCS reporting - NCS fetches worker metrics via NODE_WORKER_URL Node Worker: - GET /metrics endpoint (inflight, concurrency, latency buffers) - Latency tracking per job type (llm/vision) with rolling buffer - Fire-and-forget latency reporting to NCS after each successful job Router (model_select v3): - score_candidate(): wait + model_latency + cross_node_penalty + prefer_bonus - LOCAL_THRESHOLD_MS=250: prefer local if within threshold of remote - ModelSelection.score field for observability - Structured [score] logs with chosen node, model, and score breakdown Tests: 19 new (12 scoring + 7 NCS metrics), 36 total pass Docs: ops/runbook_p3_1.md, ops/CHANGELOG_FABRIC.md No breaking changes to JobRequest/JobResponse or capabilities schema. Made-with: Cursor
2.2 KiB
2.2 KiB
Agent Fabric Layer — Changelog
v0.3 — P3.1 GPU/Queue-aware Routing (2026-02-27)
NCS (Node Capabilities Service)
- NEW
metrics.pymodule: NodeLoad + RuntimeLoad collection - Capabilities payload now includes
node_loadandruntime_load node_load: inflight_jobs, queue_depth, concurrency_limit, estimated_wait_ms, cpu_load_1m, mem_pressureruntime_load: per-runtime healthy status, p50_ms, p95_ms from rolling window- NEW
POST /capabilities/report_latency— accepts latency reports from node-worker - NCS fetches worker metrics via
NODE_WORKER_URLenv
Node Worker
- NEW
GET /metricsendpoint: inflight_jobs, concurrency_limit, last_latencies_llm/vision - Latency tracking: rolling buffer of last 50 latencies per type
- Fire-and-forget latency reporting to NCS after each successful job
Router (model_select v3)
- NEW
score_candidate()function: wait + model_latency + cross_penalty + prefer_bonus - Selection uses scoring instead of simple local-first ordering
LOCAL_THRESHOLD_MS = 250: prefer local if within threshold of remoteModelSelection.scorefield added- Structured log format:
[score] agent=X type=Y chosen=LOCAL:node/model score=N
Tests
- 12 scoring tests (local wins, remote wins, exclude, breaker, type filter, prefer list, cross penalty, wait, threshold)
- 7 NCS metrics tests (latency stats, cpu load, mem pressure, node load, runtime load)
No Breaking Changes
- JobRequest/JobResponse envelope unchanged
- Existing capabilities fields preserved
- All new fields are optional/additive
v0.2 — P2.2+P2.3 NATS Offload (2026-02-26)
- Node Worker service (NATS offload executor)
- offload_client.py (circuit breaker, retries, deadline)
- model_select with exclude_nodes + force_local
- Router /infer remote offload path
v0.1 — P2 Global Capabilities (2026-02-26)
- Node Capabilities Service (NCS) on each node
- global_capabilities_client.py (NATS scatter-gather discovery)
- model_select v2 (multi-node aware)
- NATS wildcard discovery: node.*.capabilities.get
v0.0 — P1 NCS-first Selection (2026-02-26)
- capabilities_client.py (single-node HTTP)
- model_select v1 (profile → NCS → static fallback)
- Grok API integration fix