microdao-daarion

daarion-admin/microdao-daarion

Fork 0

Commit Graph

Author	SHA1	Message	Date
Apple	a605b8c43e	P3.1: GPU/Queue-aware routing — NCS metrics + scoring-based model selection NCS (services/node-capabilities/metrics.py): - NodeLoad: inflight_jobs, queue_depth, concurrency_limit, estimated_wait_ms, cpu_load_1m, mem_pressure (macOS + Linux), rtt_ms_to_hub - RuntimeLoad: per-runtime healthy, p50_ms, p95_ms from rolling 50-sample window - POST /capabilities/report_latency for node-worker → NCS reporting - NCS fetches worker metrics via NODE_WORKER_URL Node Worker: - GET /metrics endpoint (inflight, concurrency, latency buffers) - Latency tracking per job type (llm/vision) with rolling buffer - Fire-and-forget latency reporting to NCS after each successful job Router (model_select v3): - score_candidate(): wait + model_latency + cross_node_penalty + prefer_bonus - LOCAL_THRESHOLD_MS=250: prefer local if within threshold of remote - ModelSelection.score field for observability - Structured [score] logs with chosen node, model, and score breakdown Tests: 19 new (12 scoring + 7 NCS metrics), 36 total pass Docs: ops/runbook_p3_1.md, ops/CHANGELOG_FABRIC.md No breaking changes to JobRequest/JobResponse or capabilities schema. Made-with: Cursor	2026-02-27 02:55:44 -08:00
Apple	e2a3ae342a	node2: fix Sofiia routing determinism + Node Capabilities Service Bug fixes: - Bug A: GROK_API_KEY env mismatch — router expected GROK_API_KEY but only XAI_API_KEY was present. Added GROK_API_KEY=${XAI_API_KEY} alias in compose. - Bug B: 'grok' profile missing in router-config.node2.yml — added cloud_grok profile (provider: grok, model: grok-2-1212). Sofiia now has default_llm=cloud_grok with fallback_llm=local_default_coder. - Bug C: Router silently defaulted to cloud DeepSeek when profile was unknown. Now falls back to agent.fallback_llm or local_default_coder with WARNING log. Hardcoded Ollama URL (172.18.0.1) replaced with config-driven base_url. New service: Node Capabilities Service (NCS) - services/node-capabilities/ — FastAPI microservice exposing live model inventory from Ollama, Swapper, and llama-server. - GET /capabilities — canonical JSON with served_models[] and inventory_only[] - GET /capabilities/models — flat list of served models - POST /capabilities/refresh — force cache refresh - Cache TTL 15s, bound to 127.0.0.1:8099 - services/router/capabilities_client.py — async client with TTL cache Artifacts: - ops/node2_models_audit.md — 3-layer model view (served/disk/cloud) - ops/node2_models_audit.yml — machine-readable audit - ops/node2_capabilities_example.json — sample NCS output (14 served models) Made-with: Cursor	2026-02-27 02:07:40 -08:00

Author

SHA1

Message

Date

Apple

a605b8c43e

P3.1: GPU/Queue-aware routing — NCS metrics + scoring-based model selection

NCS (services/node-capabilities/metrics.py):
- NodeLoad: inflight_jobs, queue_depth, concurrency_limit, estimated_wait_ms,
  cpu_load_1m, mem_pressure (macOS + Linux), rtt_ms_to_hub
- RuntimeLoad: per-runtime healthy, p50_ms, p95_ms from rolling 50-sample window
- POST /capabilities/report_latency for node-worker → NCS reporting
- NCS fetches worker metrics via NODE_WORKER_URL

Node Worker:
- GET /metrics endpoint (inflight, concurrency, latency buffers)
- Latency tracking per job type (llm/vision) with rolling buffer
- Fire-and-forget latency reporting to NCS after each successful job

Router (model_select v3):
- score_candidate(): wait + model_latency + cross_node_penalty + prefer_bonus
- LOCAL_THRESHOLD_MS=250: prefer local if within threshold of remote
- ModelSelection.score field for observability
- Structured [score] logs with chosen node, model, and score breakdown

Tests: 19 new (12 scoring + 7 NCS metrics), 36 total pass
Docs: ops/runbook_p3_1.md, ops/CHANGELOG_FABRIC.md

No breaking changes to JobRequest/JobResponse or capabilities schema.

Made-with: Cursor

2026-02-27 02:55:44 -08:00

Apple

e2a3ae342a

node2: fix Sofiia routing determinism + Node Capabilities Service

Bug fixes:
- Bug A: GROK_API_KEY env mismatch — router expected GROK_API_KEY but only
  XAI_API_KEY was present. Added GROK_API_KEY=${XAI_API_KEY} alias in compose.
- Bug B: 'grok' profile missing in router-config.node2.yml — added cloud_grok
  profile (provider: grok, model: grok-2-1212). Sofiia now has
  default_llm=cloud_grok with fallback_llm=local_default_coder.
- Bug C: Router silently defaulted to cloud DeepSeek when profile was unknown.
  Now falls back to agent.fallback_llm or local_default_coder with WARNING log.
  Hardcoded Ollama URL (172.18.0.1) replaced with config-driven base_url.

New service: Node Capabilities Service (NCS)
- services/node-capabilities/ — FastAPI microservice exposing live model
  inventory from Ollama, Swapper, and llama-server.
- GET /capabilities — canonical JSON with served_models[] and inventory_only[]
- GET /capabilities/models — flat list of served models
- POST /capabilities/refresh — force cache refresh
- Cache TTL 15s, bound to 127.0.0.1:8099
- services/router/capabilities_client.py — async client with TTL cache

Artifacts:
- ops/node2_models_audit.md — 3-layer model view (served/disk/cloud)
- ops/node2_models_audit.yml — machine-readable audit
- ops/node2_capabilities_example.json — sample NCS output (14 served models)

Made-with: Cursor

2026-02-27 02:07:40 -08:00

2 Commits