Files
microdao-daarion/docs/runbooks/AGENT_REGISTRY_NODE1_DECISION_2026-02-16.md

3.4 KiB

AGENT REGISTRY Decision (NODE1 Runtime)

Date: 2026-02-16 Scope: Decide how to reconcile config/agent_registry.yml with the real NODE1 runtime architecture. Source policy: Runtime-first (facts from /opt/microdao-daarion on NODE1).

Runtime Facts (Verified)

NODE1 current state

  • Runtime root: /opt/microdao-daarion
  • Branch/HEAD: codex/inventory-audit-20260214 / 6fcd406d36fa04be78073c039bca759baea10e7b
  • Core health:
    • Router 9102 healthy
    • Gateway 9300 healthy
    • Swapper 8890 healthy
  • Canary status:
    • ops/canary_all.sh -> PASS
    • ops/canary_senpai_osr_guard.sh -> PASS

Agents observed in runtime files

  • config/agent_registry.yml:
    • Total agents: 15
    • Internal agents: monitor, devtools
    • comfy: absent
  • config/router_agents.json:
    • Total agents: 16
    • comfy: present
  • gateway /health:
    • agents_count: 13 user-facing agents

Conclusion: runtime has a documented mismatch:

  • registry source (agent_registry.yml) says 15 (without comfy)
  • generated router registry (router_agents.json) says 16 (with comfy)

Connectivity Facts (NODE3/NODE4)

From this workstation

  • SSH zevs@212.8.58.133:33147 -> Network is unreachable
  • SSH zevss@212.8.58.133:33148 -> Network is unreachable

From NODE1 to NODE3/NODE4 address

  • 212.8.58.133:8880 (expected Comfy API) -> timeout
  • 212.8.58.133:33147 -> No route to host
  • 212.8.58.133:33148 -> No route to host

Conclusion: NODE1 currently has no reliable network path to NODE3/NODE4 services.

Decision

Decision ID: ADR-NODE1-REGISTRY-2026-02-16-A

  1. For NODE1 production, treat comfy as disabled/unavailable until connectivity to NODE3 is restored.
  2. Align registry artifacts so they describe actual runtime, not aspirational topology:
    • config/agent_registry.yml and generated outputs must be consistent on NODE1.
    • Do not keep comfy in generated runtime registries while NODE1 cannot reach Comfy endpoint.
  3. Keep existing media-delivery code paths in gateway/router (safe and already validated), but mark external generation as conditional on reachable endpoint.

Rationale:

  • Prevents hidden routing to unreachable services.
  • Removes ambiguity between source registry and generated files.
  • Matches observed healthy production behavior (13 user-facing + 2 internal).

Operational Rules Until NODE3/NODE4 Access Is Restored

  1. Do not advertise comfy as active in NODE1 runtime registries.
  2. Keep COMFY_AGENT_URL as optional env only (non-authoritative for agent availability).
  3. Before enabling comfy on NODE1, require:
    • successful TCP check to 212.8.58.133:8880
    • successful API health call
    • post-enable canary pass

Required Follow-up Actions

  1. Reconcile registry source/generation pipeline on canonical repo:
    • ensure one deterministic generated set from config/agent_registry.yml
    • remove stale generated artifacts that conflict with source
  2. Add explicit status field for external agents (example: enabled, reachable) to avoid binary present/absent confusion.
  3. Add pre-deploy guard:
    • if external agent endpoint unreachable, block publish of that agent to NODE1 runtime registries.

Verification Commands (Used)

On NODE1:

  • python3 check of config/agent_registry.yml and config/router_agents.json counts
  • curl http://127.0.0.1:9300/health
  • ops/canary_all.sh
  • ops/canary_senpai_osr_guard.sh
  • nc and curl checks to 212.8.58.133:{8880,33147,33148}