microdao-daarion/ops/CHANGELOG_FABRIC.md

# Agent Fabric Layer — Changelog

## v0.3 — P3.1 GPU/Queue-aware Routing (2026-02-27)

### NCS (Node Capabilities Service)
- **NEW** `metrics.py` module: NodeLoad + RuntimeLoad collection
- Capabilities payload now includes `node_load` and `runtime_load`
- `node_load`: inflight_jobs, queue_depth, concurrency_limit, estimated_wait_ms, cpu_load_1m, mem_pressure
- `runtime_load`: per-runtime healthy status, p50_ms, p95_ms from rolling window
- **NEW** `POST /capabilities/report_latency` — accepts latency reports from node-worker
- NCS fetches worker metrics via `NODE_WORKER_URL` env

### Node Worker
- **NEW** `GET /metrics` endpoint: inflight_jobs, concurrency_limit, last_latencies_llm/vision
- Latency tracking: rolling buffer of last 50 latencies per type
- Fire-and-forget latency reporting to NCS after each successful job

### Router (model_select v3)
- **NEW** `score_candidate()` function: wait + model_latency + cross_penalty + prefer_bonus
- Selection uses scoring instead of simple local-first ordering
- `LOCAL_THRESHOLD_MS = 250`: prefer local if within threshold of remote
- `ModelSelection.score` field added
- Structured log format: `[score] agent=X type=Y chosen=LOCAL:node/model score=N`

### Tests
- 12 scoring tests (local wins, remote wins, exclude, breaker, type filter, prefer list, cross penalty, wait, threshold)
- 7 NCS metrics tests (latency stats, cpu load, mem pressure, node load, runtime load)

### No Breaking Changes
- JobRequest/JobResponse envelope unchanged
- Existing capabilities fields preserved
- All new fields are optional/additive

---

## v0.2 — P2.2+P2.3 NATS Offload (2026-02-26)

- Node Worker service (NATS offload executor)
- offload_client.py (circuit breaker, retries, deadline)
- model_select with exclude_nodes + force_local
- Router /infer remote offload path

## v0.1 — P2 Global Capabilities (2026-02-26)

- Node Capabilities Service (NCS) on each node
- global_capabilities_client.py (NATS scatter-gather discovery)
- model_select v2 (multi-node aware)
- NATS wildcard discovery: node.*.capabilities.get

## v0.0 — P1 NCS-first Selection (2026-02-26)

- capabilities_client.py (single-node HTTP)
- model_select v1 (profile → NCS → static fallback)
- Grok API integration fix