microdao-daarion

Author	SHA1	Message	Date
Apple	a605b8c43e	P3.1: GPU/Queue-aware routing — NCS metrics + scoring-based model selection NCS (services/node-capabilities/metrics.py): - NodeLoad: inflight_jobs, queue_depth, concurrency_limit, estimated_wait_ms, cpu_load_1m, mem_pressure (macOS + Linux), rtt_ms_to_hub - RuntimeLoad: per-runtime healthy, p50_ms, p95_ms from rolling 50-sample window - POST /capabilities/report_latency for node-worker → NCS reporting - NCS fetches worker metrics via NODE_WORKER_URL Node Worker: - GET /metrics endpoint (inflight, concurrency, latency buffers) - Latency tracking per job type (llm/vision) with rolling buffer - Fire-and-forget latency reporting to NCS after each successful job Router (model_select v3): - score_candidate(): wait + model_latency + cross_node_penalty + prefer_bonus - LOCAL_THRESHOLD_MS=250: prefer local if within threshold of remote - ModelSelection.score field for observability - Structured [score] logs with chosen node, model, and score breakdown Tests: 19 new (12 scoring + 7 NCS metrics), 36 total pass Docs: ops/runbook_p3_1.md, ops/CHANGELOG_FABRIC.md No breaking changes to JobRequest/JobResponse or capabilities schema. Made-with: Cursor	2026-02-27 02:55:44 -08:00
Apple	c4b94a327d	P2.2+P2.3: NATS offload node-worker + router offload integration Node Worker (services/node-worker/): - NATS subscriber for node.{NODE_ID}.llm.request / vision.request - Canonical JobRequest/JobResponse envelope (Pydantic) - Idempotency cache (TTL 10min) with inflight dedup - Deadline enforcement (DEADLINE_EXCEEDED on expired jobs) - Concurrency limiter (semaphore, returns busy) - Ollama + Swapper vision providers Router offload (services/router/offload_client.py): - NATS req/reply with configurable retries - Circuit breaker per node+type (3 fails/60s → open 120s) - Concurrency semaphore for remote requests Model selection (services/router/model_select.py): - exclude_nodes parameter for circuit-broken nodes - force_local flag for fallback re-selection - Integrated circuit breaker state awareness Router /infer pipeline: - Remote offload path when NCS selects remote node - Automatic fallback: exclude failed node → force_local re-select - Deadline propagation from router to node-worker Tests: 17 unit tests (idempotency, deadline, circuit breaker) Docs: ops/offload_routing.md (subjects, envelope, verification) Made-with: Cursor	2026-02-27 02:44:05 -08:00
Apple	a92c424845	P2: Global multi-node model selection + NCS on NODA1 Architecture for 150+ nodes: - global_capabilities_client.py: NATS scatter-gather discovery using wildcard subject node.*.capabilities.get — zero static node lists. New nodes auto-register by deploying NCS and subscribing to NATS. Dead nodes expire from cache after 3x TTL automatically. Multi-node model_select.py: - ModelSelection now includes node, local, via_nats fields - select_best_model prefers local candidates, then remote - Prefer list resolution: local first, remote second - All logged per request: node, runtime, model, local/remote NODA1 compose: - Added node-capabilities service (NCS) to docker-compose.node1.yml - NATS subscription: node.noda1.capabilities.get - Router env: NODE_CAPABILITIES_URL + ENABLE_GLOBAL_CAPS_NATS=true NODA2 compose: - Router env: ENABLE_GLOBAL_CAPS_NATS=true Router main.py: - Startup: initializes global_capabilities_client (NATS connect + first discovery). Falls back to local-only capabilities_client if unavailable. - /infer: uses get_global_capabilities() for cross-node model pool - Offload support: send_offload_request(node_id, type, payload) via NATS Verified on NODA2: - Global caps: 1 node, 14 models (NODA1 not yet deployed) - Sofiia: cloud_grok → grok-4-1-fast-reasoning (OK) - Helion: NCS → qwen3:14b local (OK) - When NODA1 deploys NCS, its models appear automatically via NATS discovery Made-with: Cursor	2026-02-27 02:26:12 -08:00
Apple	89c3f2ac66	P1: NCS-first model selection + NATS capabilities + Grok 4.1 Router model selection: - New model_select.py: resolve_effective_profile → profile_requirements → select_best_model pipeline. NCS-first with graceful static fallback. - selection_policies in router-config.node2.yml define prefer order per profile without hardcoding models (e.g. local_default_coder prefers qwen3:14b then qwen3.5:35b-a3b). - Cloud profiles (cloud_grok, cloud_deepseek) skip NCS; on cloud failure use fallback_profile via NCS for local selection. - Structured logs: selected_profile, required_type, runtime, model, caps_age_s, fallback_reason on every infer request. Grok model fix: - grok-2-1212 no longer exists on xAI API → updated to grok-4-1-fast-reasoning across all 3 hardcoded locations in main.py and router-config.node2.yml. NCS NATS request/reply: - node-capabilities subscribes to node.noda2.capabilities.get (NATS request/reply). Enabled via ENABLE_NATS_CAPS=true in compose. - NODA1 router can query NODA2 capabilities over NATS leafnode without HTTP connectivity. Verified: - NCS: 14 served models from Ollama+Swapper+llama-server - NATS: request/reply returns full capabilities JSON - Sofiia: cloud_grok → grok-4-1-fast-reasoning (tested, 200 OK) - Helion: NCS → qwen3:14b via Ollama (caps_age=23.7s cache hit) - Router health: ok Made-with: Cursor	2026-02-27 02:17:34 -08:00
Apple	e2a3ae342a	node2: fix Sofiia routing determinism + Node Capabilities Service Bug fixes: - Bug A: GROK_API_KEY env mismatch — router expected GROK_API_KEY but only XAI_API_KEY was present. Added GROK_API_KEY=${XAI_API_KEY} alias in compose. - Bug B: 'grok' profile missing in router-config.node2.yml — added cloud_grok profile (provider: grok, model: grok-2-1212). Sofiia now has default_llm=cloud_grok with fallback_llm=local_default_coder. - Bug C: Router silently defaulted to cloud DeepSeek when profile was unknown. Now falls back to agent.fallback_llm or local_default_coder with WARNING log. Hardcoded Ollama URL (172.18.0.1) replaced with config-driven base_url. New service: Node Capabilities Service (NCS) - services/node-capabilities/ — FastAPI microservice exposing live model inventory from Ollama, Swapper, and llama-server. - GET /capabilities — canonical JSON with served_models[] and inventory_only[] - GET /capabilities/models — flat list of served models - POST /capabilities/refresh — force cache refresh - Cache TTL 15s, bound to 127.0.0.1:8099 - services/router/capabilities_client.py — async client with TTL cache Artifacts: - ops/node2_models_audit.md — 3-layer model view (served/disk/cloud) - ops/node2_models_audit.yml — machine-readable audit - ops/node2_capabilities_example.json — sample NCS output (14 served models) Made-with: Cursor	2026-02-27 02:07:40 -08:00
Apple	3965f68fac	node2: full model inventory audit 2026-02-27 Read-only audit of all installed models on NODA2 (MacBook M4 Max): - 12 Ollama models, 1 llama-server duplicate, 16 HF cache models - ComfyUI stack (200+ GB): FLUX.2-dev, LTX-2 video, SDXL - Whisper-large-v3-turbo (MLX, 1.5GB) + Kokoro TTS (MLX, 0.35GB) installed but unused - MiniCPM-V-4_5 (16GB) installed but not in Swapper (better than llava:13b) - Key finding: 149GB cleanup potential; llama-server duplicates Ollama (P1, 20GB) Artifacts: - ops/node2_models_inventory_20260227.json - ops/node2_models_inventory_20260227.md - ops/node2_model_capabilities.yml - ops/node2_model_gaps.yml Made-with: Cursor	2026-02-27 01:44:26 -08:00
Apple	7b8499dd8a	node2: P0 vision restore + P1 security hardening + node-specific router config P0 — Vision: - swapper_config_node2.yaml: add llava-13b as vision model (vision:true) /vision/models now returns non-empty list; inference verified ~3.5s - ollama.url fixed to host.docker.internal:11434 (was localhost, broken in Docker) P1 — Security: - Remove NODES_NODA1_SSH_PASSWORD from .env and docker-compose.node2-sofiia.yml - SSH ED25519 key generated, authorized on NODA1, mounted as /run/secrets/noda1_ssh_key - sofiia-console reads key via NODES_NODA1_SSH_PRIVATE_KEY env var - secrets/noda1_id_ed25519 added to .gitignore P1 — Router: - services/router/router-config.node2.yml: new node2-specific config replaces all 172.17.0.1:11434 → host.docker.internal:11434 - docker-compose.node2-sofiia.yml: mount router-config.node2.yml (not root config) P1 — Ports: - router (9102), swapper (8890), sofiia-console (8002): bind to 127.0.0.1 - gateway (9300): keep 0.0.0.0 (Telegram webhook requires public access) Artifacts: - ops/patch_node2_P0P1_20260227.md — change log - ops/validation_node2_P0P1_20260227.md — all checks PASS - ops/node2.env.example — safe env template (no secrets) - ops/security_hardening_node2.md — SSH key migration guide + firewall - ops/node2_models_pull.sh — model pull script for P0/P1 Made-with: Cursor	2026-02-27 01:27:38 -08:00
Apple	46d7dea88a	docs(audit): NODA2 full audit 2026-02-27 - ops/audit_node2_20260227.md: readable report (hardware, containers, models, Sofiia, findings) - ops/audit_node2_20260227.json: structured machine-readable inventory - ops/audit_node2_findings.yml: 10 PASS + 5 PARTIAL + 3 FAIL + 3 SECURITY gaps - ops/node2_capabilities.yml: router-ready capabilities (vision/text/code/stt/tts models) Key findings: P0: vision pipeline broken (/vision/models=empty, qwen3-vl:8b not installed) P1: node-ops-worker missing, SSH root password in sofiia-console env P1: router-config.yml uses 172.17.0.1 (Linux bridge) not host.docker.internal Made-with: Cursor	2026-02-27 01:14:38 -08:00
Apple	974522f12b	feat(noda2): enable NATS leafnode remote to NODA1:7422 - nats-server.conf: added leafnodes.remotes to nats://144.76.224.179:7422 - NODA2 now a spoke leaf node; NODA1 is hub - Cross-node pub/sub verified: NODA1 pub → NODA2 sub (node.test.>) - Leafnode connection confirmed: 144.76.224.179:7422 lid:5 Made-with: Cursor	2026-02-26 23:36:25 -08:00
Apple	e00e7af1e7	agromatrix: harden correction learning and invalidate wrong labels	2026-02-21 02:25:40 -08:00
Apple	2b0b142f95	gateway: fix greeting UX and reduce false photo-intent fallbacks	2026-02-21 00:05:09 -08:00
Apple	0a87eadb8d	gateway: auto-handle unresolved user questions in chat context	2026-02-20 23:54:52 -08:00
Apple	7b5357228f	doc-service: add shared deterministic excel answer contract	2026-02-20 14:16:16 -08:00
Apple	e6c083a000	gateway: enforce source-lock, pii guard, style profile, and intent retry	2026-02-20 14:16:07 -08:00
Apple	195eb9b7ac	agents: add planned AISTALK orchestrator and crew profile	2026-02-20 10:24:59 -08:00
Apple	e01ed7be75	router: remove qwen2.5 profile and pin monitor to local qwen3	2026-02-19 00:25:55 -08:00
Apple	e82d70553d	chore: ignore local rollback backup snapshots	2026-02-19 00:14:51 -08:00
Apple	544874d952	docs: add node1 runbooks, consolidation artifacts, and maintenance scripts	2026-02-19 00:14:27 -08:00
Apple	c57e6ed96b	services: update comfy agent, senpai md consumer, and swapper deps	2026-02-19 00:14:18 -08:00
Apple	c201d105f6	services: add clan consent/visibility and oneok adapter stack	2026-02-19 00:14:12 -08:00
Apple	dfc0ef1ceb	runtime: sync router/gateway/config policy and clan role registry	2026-02-19 00:14:06 -08:00
Apple	675b25953b	chore: ignore backup/temp artifacts and local worktree scratch	2026-02-18 10:47:26 -08:00
Apple	de8bb36462	docs+router: formalize runtime policy and remove temporary cloud-first code override	2026-02-18 10:40:40 -08:00
Apple	05435e7fad	router: bypass local routing rules for cloud-first agents	2026-02-18 10:28:53 -08:00
Apple	ef59cb0950	router: enforce cloud-first direct path for top-level and monitor agents	2026-02-18 10:26:29 -08:00
Apple	5bca7fb79d	router: unify top-level DeepSeek-first + on-demand CrewAI policy	2026-02-18 10:20:10 -08:00
Apple	a23cde217f	clan: route simple requests to fast crew profile; keep zhos_mvp for complex	2026-02-18 09:59:53 -08:00
Apple	7c3bc68ac2	clan: restore zhos_mvp profile in crewai-service and re-enable clan zhos routing	2026-02-18 09:56:06 -08:00
Apple	b65ed7cdf2	clan: stop forcing missing zhos_mvp crew profile; use available default	2026-02-18 09:43:33 -08:00
Apple	13aa0c79f0	router: bundle CLAN runtime registry in router image path	2026-02-18 09:42:00 -08:00
Apple	63fec84734	clan: map runtime-guard manager alias so agent_id=clan is recognized	2026-02-18 09:40:54 -08:00
Apple	bfd0e05bc9	doc-service: parse fact_value_json string in doc context lookup	2026-02-18 09:37:54 -08:00
Apple	30ea12e0f8	doc-service: persist doc_context by stable session key	2026-02-18 09:37:12 -08:00
Apple	d42bb09912	helion: stabilize doc context, remove legacy webhook path, add stack smoke canary	2026-02-18 09:36:16 -08:00
Apple	760022d7f5	helion: ignore keyword complexity hints; trigger CrewAI only by explicit detailed/complex flags	2026-02-18 09:25:52 -08:00
Apple	635f2d7e37	helion: deepseek-first, on-demand CrewAI, local subagent profiles, concise post-synthesis	2026-02-18 09:21:47 -08:00
Apple	343bdc2d11	prompts: add DAARWIZZ awareness to legacy nutra prompt	2026-02-18 08:44:04 -08:00
Apple	6b5e462c85	prompts: enforce DAARWIZZ awareness across top-level agents	2026-02-18 08:43:29 -08:00
Apple	e5a6e310b7	ops: make DAARWIZZ awareness canary static by default with optional runtime mode	2026-02-18 08:29:02 -08:00
Apple	00b77066b0	ops: add DAARWIZZ awareness canary for all top-level agents	2026-02-18 08:22:50 -08:00
Apple	2c03632f67	senpai: enforce DAARWIZZ network awareness; sync daarwizz delegation roster	2026-02-18 08:12:03 -08:00
Apple	71b248de23	gitignore: ignore runtime canary status artifacts	2026-02-18 06:14:11 -08:00
Apple	249b2e1e94	ops: restore canary_all and harden monitor summary script invocation	2026-02-18 06:13:15 -08:00
Apple	77ab034744	Sync NODE1 crewai-service runtime files and monitor summary script	2026-02-18 06:00:19 -08:00
Apple	963813607b	Docs sync: align OPENAPI contracts with NODE1 runtime	2026-02-18 05:58:54 -08:00
Apple	b9f83a5006	Sync NODE1 runtime config for Sofiia monitor + Clan canary fixes	2026-02-18 05:56:21 -08:00
Apple	7df8cd5882	docs: sync consolidation and session starter	2026-02-16 02:25:54 -08:00
Apple	798c6f88c7	docs: sync consolidation and session starter	2026-02-16 02:21:49 -08:00
Apple	b962d4a288	docs: sync consolidation and session starter	2026-02-16 02:15:59 -08:00
Apple	de3bd8c13f	docs: sync consolidation and session starter	2026-02-16 02:15:20 -08:00

1 2 3 4 5 ...

494 Commits