Files
microdao-daarion/docs/runbooks/AGENT_REGISTRY_NODE1_DECISION_2026-02-16.md

90 lines
3.4 KiB
Markdown

# AGENT REGISTRY Decision (NODE1 Runtime)
Date: 2026-02-16
Scope: Decide how to reconcile `config/agent_registry.yml` with the real NODE1 runtime architecture.
Source policy: Runtime-first (facts from `/opt/microdao-daarion` on NODE1).
## Runtime Facts (Verified)
### NODE1 current state
- Runtime root: `/opt/microdao-daarion`
- Branch/HEAD: `codex/inventory-audit-20260214` / `6fcd406d36fa04be78073c039bca759baea10e7b`
- Core health:
- Router `9102` healthy
- Gateway `9300` healthy
- Swapper `8890` healthy
- Canary status:
- `ops/canary_all.sh` -> PASS
- `ops/canary_senpai_osr_guard.sh` -> PASS
### Agents observed in runtime files
- `config/agent_registry.yml`:
- Total agents: 15
- Internal agents: `monitor`, `devtools`
- `comfy`: absent
- `config/router_agents.json`:
- Total agents: 16
- `comfy`: present
- `gateway /health`:
- `agents_count`: 13 user-facing agents
Conclusion: runtime has a documented mismatch:
- registry source (`agent_registry.yml`) says 15 (without `comfy`)
- generated router registry (`router_agents.json`) says 16 (with `comfy`)
## Connectivity Facts (NODE3/NODE4)
### From this workstation
- SSH `zevs@212.8.58.133:33147` -> `Network is unreachable`
- SSH `zevss@212.8.58.133:33148` -> `Network is unreachable`
### From NODE1 to NODE3/NODE4 address
- `212.8.58.133:8880` (expected Comfy API) -> timeout
- `212.8.58.133:33147` -> `No route to host`
- `212.8.58.133:33148` -> `No route to host`
Conclusion: NODE1 currently has no reliable network path to NODE3/NODE4 services.
## Decision
Decision ID: `ADR-NODE1-REGISTRY-2026-02-16-A`
1. For NODE1 production, treat `comfy` as disabled/unavailable until connectivity to NODE3 is restored.
2. Align registry artifacts so they describe actual runtime, not aspirational topology:
- `config/agent_registry.yml` and generated outputs must be consistent on NODE1.
- Do not keep `comfy` in generated runtime registries while NODE1 cannot reach Comfy endpoint.
3. Keep existing media-delivery code paths in gateway/router (safe and already validated), but mark external generation as conditional on reachable endpoint.
Rationale:
- Prevents hidden routing to unreachable services.
- Removes ambiguity between source registry and generated files.
- Matches observed healthy production behavior (13 user-facing + 2 internal).
## Operational Rules Until NODE3/NODE4 Access Is Restored
1. Do not advertise `comfy` as active in NODE1 runtime registries.
2. Keep `COMFY_AGENT_URL` as optional env only (non-authoritative for agent availability).
3. Before enabling `comfy` on NODE1, require:
- successful TCP check to `212.8.58.133:8880`
- successful API health call
- post-enable canary pass
## Required Follow-up Actions
1. Reconcile registry source/generation pipeline on canonical repo:
- ensure one deterministic generated set from `config/agent_registry.yml`
- remove stale generated artifacts that conflict with source
2. Add explicit status field for external agents (example: `enabled`, `reachable`) to avoid binary present/absent confusion.
3. Add pre-deploy guard:
- if external agent endpoint unreachable, block publish of that agent to NODE1 runtime registries.
## Verification Commands (Used)
On NODE1:
- `python3` check of `config/agent_registry.yml` and `config/router_agents.json` counts
- `curl http://127.0.0.1:9300/health`
- `ops/canary_all.sh`
- `ops/canary_senpai_osr_guard.sh`
- `nc` and `curl` checks to `212.8.58.133:{8880,33147,33148}`