docs: add node1 runbooks, consolidation artifacts, and maintenance scripts
This commit is contained in:
89
docs/runbooks/AGENT_REGISTRY_NODE1_DECISION_2026-02-16.md
Normal file
89
docs/runbooks/AGENT_REGISTRY_NODE1_DECISION_2026-02-16.md
Normal file
@@ -0,0 +1,89 @@
|
||||
# AGENT REGISTRY Decision (NODE1 Runtime)
|
||||
|
||||
Date: 2026-02-16
|
||||
Scope: Decide how to reconcile `config/agent_registry.yml` with the real NODE1 runtime architecture.
|
||||
Source policy: Runtime-first (facts from `/opt/microdao-daarion` on NODE1).
|
||||
|
||||
## Runtime Facts (Verified)
|
||||
|
||||
### NODE1 current state
|
||||
- Runtime root: `/opt/microdao-daarion`
|
||||
- Branch/HEAD: `codex/inventory-audit-20260214` / `6fcd406d36fa04be78073c039bca759baea10e7b`
|
||||
- Core health:
|
||||
- Router `9102` healthy
|
||||
- Gateway `9300` healthy
|
||||
- Swapper `8890` healthy
|
||||
- Canary status:
|
||||
- `ops/canary_all.sh` -> PASS
|
||||
- `ops/canary_senpai_osr_guard.sh` -> PASS
|
||||
|
||||
### Agents observed in runtime files
|
||||
- `config/agent_registry.yml`:
|
||||
- Total agents: 15
|
||||
- Internal agents: `monitor`, `devtools`
|
||||
- `comfy`: absent
|
||||
- `config/router_agents.json`:
|
||||
- Total agents: 16
|
||||
- `comfy`: present
|
||||
- `gateway /health`:
|
||||
- `agents_count`: 13 user-facing agents
|
||||
|
||||
Conclusion: runtime has a documented mismatch:
|
||||
- registry source (`agent_registry.yml`) says 15 (without `comfy`)
|
||||
- generated router registry (`router_agents.json`) says 16 (with `comfy`)
|
||||
|
||||
## Connectivity Facts (NODE3/NODE4)
|
||||
|
||||
### From this workstation
|
||||
- SSH `zevs@212.8.58.133:33147` -> `Network is unreachable`
|
||||
- SSH `zevss@212.8.58.133:33148` -> `Network is unreachable`
|
||||
|
||||
### From NODE1 to NODE3/NODE4 address
|
||||
- `212.8.58.133:8880` (expected Comfy API) -> timeout
|
||||
- `212.8.58.133:33147` -> `No route to host`
|
||||
- `212.8.58.133:33148` -> `No route to host`
|
||||
|
||||
Conclusion: NODE1 currently has no reliable network path to NODE3/NODE4 services.
|
||||
|
||||
## Decision
|
||||
|
||||
Decision ID: `ADR-NODE1-REGISTRY-2026-02-16-A`
|
||||
|
||||
1. For NODE1 production, treat `comfy` as disabled/unavailable until connectivity to NODE3 is restored.
|
||||
2. Align registry artifacts so they describe actual runtime, not aspirational topology:
|
||||
- `config/agent_registry.yml` and generated outputs must be consistent on NODE1.
|
||||
- Do not keep `comfy` in generated runtime registries while NODE1 cannot reach Comfy endpoint.
|
||||
3. Keep existing media-delivery code paths in gateway/router (safe and already validated), but mark external generation as conditional on reachable endpoint.
|
||||
|
||||
Rationale:
|
||||
- Prevents hidden routing to unreachable services.
|
||||
- Removes ambiguity between source registry and generated files.
|
||||
- Matches observed healthy production behavior (13 user-facing + 2 internal).
|
||||
|
||||
## Operational Rules Until NODE3/NODE4 Access Is Restored
|
||||
|
||||
1. Do not advertise `comfy` as active in NODE1 runtime registries.
|
||||
2. Keep `COMFY_AGENT_URL` as optional env only (non-authoritative for agent availability).
|
||||
3. Before enabling `comfy` on NODE1, require:
|
||||
- successful TCP check to `212.8.58.133:8880`
|
||||
- successful API health call
|
||||
- post-enable canary pass
|
||||
|
||||
## Required Follow-up Actions
|
||||
|
||||
1. Reconcile registry source/generation pipeline on canonical repo:
|
||||
- ensure one deterministic generated set from `config/agent_registry.yml`
|
||||
- remove stale generated artifacts that conflict with source
|
||||
2. Add explicit status field for external agents (example: `enabled`, `reachable`) to avoid binary present/absent confusion.
|
||||
3. Add pre-deploy guard:
|
||||
- if external agent endpoint unreachable, block publish of that agent to NODE1 runtime registries.
|
||||
|
||||
## Verification Commands (Used)
|
||||
|
||||
On NODE1:
|
||||
- `python3` check of `config/agent_registry.yml` and `config/router_agents.json` counts
|
||||
- `curl http://127.0.0.1:9300/health`
|
||||
- `ops/canary_all.sh`
|
||||
- `ops/canary_senpai_osr_guard.sh`
|
||||
- `nc` and `curl` checks to `212.8.58.133:{8880,33147,33148}`
|
||||
|
||||
80
docs/runbooks/CLAN_AGENT_INTERACTION_PROTOCOL_V1.md
Normal file
80
docs/runbooks/CLAN_AGENT_INTERACTION_PROTOCOL_V1.md
Normal file
@@ -0,0 +1,80 @@
|
||||
# CLAN Agent Interaction Protocol v1
|
||||
|
||||
## Ролі
|
||||
- `JOS-BASE`: спільна конституція для всіх агентів.
|
||||
- `Spirit-Orchestrator`: єдина точка входу та диспетчер.
|
||||
- Worker-сабагенти: Process, Privacy-Sentinel, Gate-Policy, Identity, Core-Guardian, Bridge, Gifts, Sync, Audit-Log, Infra-Health, Research-Scout, Ritual-Field, Memory.
|
||||
|
||||
## Prompt Assembly
|
||||
- Єдиний дозволений спосіб: `final_system_prompt = JOS_BASE + "\n\n---\n\n" + SUBAGENT_PROMPT`.
|
||||
- Ручна збірка промптів у рантаймі заборонена.
|
||||
- У runtime-записі мають бути: `CONSTITUTION_VERSION` і `SUBAGENT_PROMPT_VERSION`.
|
||||
|
||||
## Єдина точка входу
|
||||
1. Будь-який запит входить через `Spirit-Orchestrator`.
|
||||
2. Оркестратор робить triage: intent, sensitivity, visibility, consent.
|
||||
3. Оркестратор запускає мінімально потрібний набір воркерів.
|
||||
4. Оркестратор повертає зведений пакет тільки у статусах draft/needs_confirmation/waiting_for_consent.
|
||||
|
||||
## Source of Truth
|
||||
- Реєстр агентів: `config/roles/clan/zhos/agents_registry.yaml`
|
||||
- Конституція: `config/roles/clan/zhos/JOS_BASE.md`
|
||||
- Manager: `config/roles/clan/zhos/orchestrator.md`
|
||||
|
||||
## Envelope-контракт
|
||||
- JSON Schema: `docs/contracts/clan-envelope.schema.json`
|
||||
- Поля: `request_id`, `circle_context`, `visibility_level_target`, `sensitivity_flags`, `consent_status`, `allowed_actions`, `expected_output`, `input_text`.
|
||||
|
||||
## Artifact-контракт
|
||||
- JSON Schema: `docs/contracts/clan-artifact.schema.json`
|
||||
- Обовʼязково: `type`, `visibility_level`, `status`, `content`, `provenance`.
|
||||
|
||||
## Правило економії
|
||||
- Не запускати всіх агентів одночасно.
|
||||
- Ліміт за замовчуванням: 1 heavy worker + 1 supporting worker.
|
||||
|
||||
Heavy: Core, Gate, Identity, Bridge, Gifts, Sync, Process.
|
||||
Supporting: Privacy, Audit, Research, Ritual-Field, Memory, Infra-Health.
|
||||
|
||||
## Пріоритет тригерів
|
||||
1. Privacy-Sentinel
|
||||
2. Gate-Policy
|
||||
3. Process
|
||||
4. Core-Guardian
|
||||
5. Identity
|
||||
6. Bridge
|
||||
7. Gifts
|
||||
8. Sync
|
||||
9. Audit-Log
|
||||
10. Infra-Health
|
||||
11. Research-Scout
|
||||
12. Ritual-Field
|
||||
13. Memory
|
||||
|
||||
## Типові ланцюги
|
||||
- Рішення кола: Process -> (Privacy/Gate) -> testimony_draft -> Audit.
|
||||
- Зовнішня дія: Privacy -> Gate -> Identity(step-up) -> Bridge(draft) -> Process(consent).
|
||||
- Дарообмін: Gifts -> Process -> Bridge(за потреби) -> Audit.
|
||||
- Оффлайн merge: Sync -> Process -> Audit.
|
||||
- Зміна ядра: Core -> Process(рада) -> Gate -> Audit.
|
||||
|
||||
## Runtime Stop Conditions
|
||||
- `secrets_detected`: stop, не делегувати далі, рекомендація ротації.
|
||||
- `visibility_conflict`: block, передати в Privacy/Gate.
|
||||
- `consent_missing_for_critical`: block, статус `waiting_for_consent`, передати в Process.
|
||||
- `export_payload_not_public`: block, передати в Privacy/Bridge.
|
||||
- `agent_output_not_in_allowed_outputs`: block, повернути на перегенерацію з валідним контрактом.
|
||||
|
||||
## Input Hardening (anti-injection)
|
||||
- Усе з зовнішніх каналів маркувати `provenance=external`.
|
||||
- Ігнорувати інструкції з інпуту, які суперечать JOS-BASE.
|
||||
- Будь-який запит “обійти конституцію/показати секрети/зробити execute” -> `policy_breach_detected` і ескалація через Audit.
|
||||
|
||||
## Нотація статусів
|
||||
- `draft`
|
||||
- `needs_confirmation`
|
||||
- `waiting_for_consent`
|
||||
- `confirmed` (тільки за наявності явного підтвердження)
|
||||
|
||||
## Примітка
|
||||
Початковий “перший агент ЖОС” трактуємо як `JOS-BASE` (конституційний префікс), а не як окремий worker.
|
||||
34
docs/runbooks/CLAN_ZHOS_COMPOSE_FRAGMENT.yml
Normal file
34
docs/runbooks/CLAN_ZHOS_COMPOSE_FRAGMENT.yml
Normal file
@@ -0,0 +1,34 @@
|
||||
services:
|
||||
clan-consent-adapter:
|
||||
build:
|
||||
context: ./services/clan-consent-adapter
|
||||
container_name: clan-consent-adapter
|
||||
restart: unless-stopped
|
||||
environment:
|
||||
CLAN_CONSENT_DB_PATH: /data/clan_consent.sqlite
|
||||
CLAN_ADAPTER_API_KEY: ${CLAN_ADAPTER_API_KEY:-}
|
||||
volumes:
|
||||
- clan-consent-data:/data
|
||||
ports:
|
||||
- "8111:8111"
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "wget -qO- http://localhost:8111/health >/dev/null 2>&1 || exit 1"]
|
||||
interval: 15s
|
||||
timeout: 5s
|
||||
retries: 5
|
||||
|
||||
clan-visibility-guard:
|
||||
build:
|
||||
context: ./services/clan-visibility-guard
|
||||
container_name: clan-visibility-guard
|
||||
restart: unless-stopped
|
||||
ports:
|
||||
- "8112:8112"
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "wget -qO- http://localhost:8112/health >/dev/null 2>&1 || exit 1"]
|
||||
interval: 15s
|
||||
timeout: 5s
|
||||
retries: 5
|
||||
|
||||
volumes:
|
||||
clan-consent-data:
|
||||
41
docs/runbooks/CLAN_ZHOS_STACK.md
Normal file
41
docs/runbooks/CLAN_ZHOS_STACK.md
Normal file
@@ -0,0 +1,41 @@
|
||||
# CLAN ZHOS Stack (Existing Architecture + Add-ons)
|
||||
|
||||
## Що вже інтегровано
|
||||
|
||||
1. Оновлено системний промт CLAN:
|
||||
- `gateway-bot/clan_prompt.txt`
|
||||
2. Додано ZHOS-профіль сабагентів у CrewAI:
|
||||
- `config/crewai_teams.yml` -> `clan.profiles.zhos_mvp`
|
||||
3. Додано промти ролей:
|
||||
- `config/roles/clan/zhos/*.md`
|
||||
|
||||
## Мінімальні додаткові сервіси (без ламання ядра)
|
||||
|
||||
1. `clan-consent-adapter` (порт `8111`)
|
||||
- Consent Event, Testimony Draft
|
||||
- Локальна SQLite, опційний Bearer API key
|
||||
2. `clan-visibility-guard` (порт `8112`)
|
||||
- перевірка downgrade видимості
|
||||
- класифікація чутливості
|
||||
- безпечна редукція тексту для нижчих рівнів
|
||||
|
||||
## OpenAPI контракти
|
||||
|
||||
1. `docs/contracts/clan-consent-adapter.openapi.yaml`
|
||||
2. `docs/contracts/clan-visibility-guard.openapi.yaml`
|
||||
|
||||
## Compose-фрагмент
|
||||
|
||||
1. `docs/runbooks/CLAN_ZHOS_COMPOSE_FRAGMENT.yml`
|
||||
|
||||
Локальний запуск:
|
||||
|
||||
```bash
|
||||
docker compose -f docker-compose.node1.yml -f docs/runbooks/CLAN_ZHOS_COMPOSE_FRAGMENT.yml up -d --build clan-consent-adapter clan-visibility-guard
|
||||
```
|
||||
|
||||
## Рекомендація активації
|
||||
|
||||
1. Залишити `clan.default_profile=default` для безпечного rollout.
|
||||
2. Увімкнути `zhos_mvp` тільки для CLAN-трафіку, який пройшов smoke tests.
|
||||
3. Після перевірки можна переключити default_profile на `zhos_mvp`.
|
||||
@@ -60,6 +60,34 @@ cd /Users/apple/github-projects/microdao-daarion
|
||||
bash scripts/docs/docs_sync.sh --apply --targets github,gitea
|
||||
```
|
||||
|
||||
## 5) Local scheduled run (no auto-push)
|
||||
|
||||
Install/update daily cron job (local timezone):
|
||||
|
||||
```bash
|
||||
cd /Users/apple/github-projects/microdao-daarion
|
||||
bash scripts/docs/install_local_cron.sh --schedule "17 9 * * *"
|
||||
```
|
||||
|
||||
Dry-run preview:
|
||||
|
||||
```bash
|
||||
cd /Users/apple/github-projects/microdao-daarion
|
||||
bash scripts/docs/install_local_cron.sh --dry-run
|
||||
```
|
||||
|
||||
Uninstall:
|
||||
|
||||
```bash
|
||||
cd /Users/apple/github-projects/microdao-daarion
|
||||
bash scripts/docs/install_local_cron.sh --uninstall
|
||||
```
|
||||
|
||||
Runner command:
|
||||
- `scripts/docs/run_docs_maintenance.sh`
|
||||
- executes `services_sync --apply` and `docs_lint`
|
||||
- does not perform any `git push`
|
||||
|
||||
## Safety gates
|
||||
|
||||
- `docs_sync.sh` is dry-run by default.
|
||||
|
||||
83
docs/runbooks/NODE_ARCH_RECONCILIATION_PLAN_2026-02-16.md
Normal file
83
docs/runbooks/NODE_ARCH_RECONCILIATION_PLAN_2026-02-16.md
Normal file
@@ -0,0 +1,83 @@
|
||||
# NODE Architecture Reconciliation Plan (NODE1 + NODE3 + NODE4)
|
||||
|
||||
Date: 2026-02-16
|
||||
Policy: Runtime-first for current state, roadmap-preserving for NODE3/NODE4.
|
||||
|
||||
## 1) Documents Confirmed (Legacy/Planning Set)
|
||||
|
||||
Found in worktrees (not in current main tree root):
|
||||
- `.worktrees/origin-main/IMPLEMENTATION-STATUS.md`
|
||||
- `.worktrees/origin-main/ARCHITECTURE-150-NODES.md`
|
||||
- `.worktrees/origin-main/infrastructure/auth/AUTH-IMPLEMENTATION-PLAN.md`
|
||||
- `.worktrees/origin-main/infrastructure/matrix-gateway/README.md`
|
||||
|
||||
Same copies found in:
|
||||
- `.worktrees/docs-node1-sync/...`
|
||||
|
||||
These files are valid architecture/program documents (dated 2026-01-10), but they are not an exact reflection of current NODE1 runtime code state on 2026-02-16.
|
||||
|
||||
## 2) Current Runtime Truth (NODE1)
|
||||
|
||||
- Runtime root: `/opt/microdao-daarion`
|
||||
- Router/Gateway/Swapper healthy.
|
||||
- Canary suite passing:
|
||||
- `ops/canary_all.sh`
|
||||
- `ops/canary_senpai_osr_guard.sh`
|
||||
- Router endpoint contract in runtime:
|
||||
- active: `POST /v1/agents/{agent_id}/infer`
|
||||
- not active: `POST /route`
|
||||
|
||||
## 3) NODE3/NODE4 Policy (Do NOT remove from architecture)
|
||||
|
||||
NODE3/NODE4 remain part of target architecture and deployment plan.
|
||||
|
||||
Current status (observed now):
|
||||
- From laptop: `212.8.58.133:33147` and `:33148` unreachable.
|
||||
- From NODE1: `212.8.58.133:8880` timeout, `:33147/:33148` no route.
|
||||
|
||||
Interpretation:
|
||||
- This is a connectivity/runtime availability issue, not an architecture removal decision.
|
||||
- Keep NODE3/NODE4 in docs and topology as `planned/temporarily_unreachable`.
|
||||
|
||||
## 4) Operating Model Until Connectivity Restored
|
||||
|
||||
Use explicit mode labeling:
|
||||
- `ACTIVE`: reachable and health-checked.
|
||||
- `DEGRADED`: included in architecture but currently unreachable.
|
||||
- `DISABLED`: intentionally turned off (not the case for NODE3/NODE4 now).
|
||||
|
||||
Current recommendation:
|
||||
- NODE1: `ACTIVE`
|
||||
- NODE3: `DEGRADED`
|
||||
- NODE4: `DEGRADED`
|
||||
|
||||
## 5) Reconciliation Rules
|
||||
|
||||
1. Do not delete NODE3/NODE4 docs, routes, or architecture references.
|
||||
2. Mark external generation dependencies as conditional by reachability checks.
|
||||
3. Runtime registries/config must not advertise unavailable external agents as locally active.
|
||||
4. Keep roadmap docs (150 nodes, auth, matrix gateway) as strategic references; do not treat them as runtime contract files.
|
||||
|
||||
## 6) Action Plan (No Risk to Production)
|
||||
|
||||
1. Create a single "Architecture Status Board" document that maps:
|
||||
- planned topology (NODE1/2/3/4...)
|
||||
- current health/reachability per node
|
||||
- last verified timestamp.
|
||||
2. Add preflight checks for external node dependencies in deployment scripts:
|
||||
- TCP check
|
||||
- service health check
|
||||
- fallback behavior logging.
|
||||
3. Resolve registry drift:
|
||||
- align `config/agent_registry.yml` and generated registry artifacts on NODE1 runtime.
|
||||
4. After NODE3/NODE4 connectivity returns:
|
||||
- run connectivity proof
|
||||
- run media generation smoke
|
||||
- switch node status from `DEGRADED` to `ACTIVE`.
|
||||
|
||||
## 7) Decision Summary
|
||||
|
||||
- Keep NODE3/NODE4 in architecture and planning.
|
||||
- Use runtime-first truth for what is currently active.
|
||||
- Maintain explicit degraded-mode status instead of silent exclusion.
|
||||
|
||||
63
docs/runbooks/ONEOK_OPENSOURCE_STACK.md
Normal file
63
docs/runbooks/ONEOK_OPENSOURCE_STACK.md
Normal file
@@ -0,0 +1,63 @@
|
||||
# ONEOK Open-Source Stack (Clean Variant)
|
||||
|
||||
## Фактичний clean-стек (впроваджено)
|
||||
|
||||
Поточна платформа працює як `Python + Docker + Router ToolManager`.
|
||||
Для `1OK` у clean-варіанті зафіксовано:
|
||||
|
||||
1. `EspoCRM + MariaDB` як окремий CRM-контур агента.
|
||||
2. `oneok-crm-adapter` як інструментальний API-шар (`crm_*`), ізоляція даних від інших агентів.
|
||||
3. `oneok-calc-adapter` як доменний калькулятор (`calc_window_quote`).
|
||||
4. `Gotenberg + oneok-docs-adapter` для PDF (`docs_render_quote_pdf`, `docs_render_invoice_pdf`).
|
||||
5. `oneok-schedule-adapter` для слотів заміру/монтажу (`Europe/Kyiv`).
|
||||
6. `Qdrant` використовується через наявний router-memory контур.
|
||||
|
||||
## Що впроваджено в router
|
||||
|
||||
1. Додано спеціалізовані 1OK tools у `router`:
|
||||
- `crm_search_client`, `crm_upsert_client`, `crm_upsert_site`, `crm_upsert_window_unit`
|
||||
- `crm_create_quote`, `crm_update_quote`, `crm_create_job`
|
||||
- `calc_window_quote`
|
||||
- `docs_render_quote_pdf`, `docs_render_invoice_pdf`
|
||||
- `schedule_propose_slots`, `schedule_confirm_slot`
|
||||
2. Додано HTTP adapter layer у `ToolManager`.
|
||||
3. Додано env endpoint-и:
|
||||
- `ONEOK_CRM_BASE_URL`
|
||||
- `ONEOK_CALC_BASE_URL`
|
||||
- `ONEOK_DOCS_BASE_URL`
|
||||
- `ONEOK_SCHEDULE_BASE_URL`
|
||||
|
||||
4. Додано ітеративний multi-step tool-calling у `router` (до `ROUTER_TOOL_MAX_ROUNDS`, дефолт `10`).
|
||||
5. Додано auto-repair аргументів для `oneok` (наприклад, `windows -> window_units`, auto-fill `quote_payload`).
|
||||
|
||||
## Download links (reference)
|
||||
|
||||
### CRM
|
||||
- https://github.com/espocrm/espocrm
|
||||
- https://github.com/espocrm/espocrm-installer
|
||||
- https://github.com/SuiteCRM/SuiteCRM
|
||||
|
||||
### Калькулятори/оцінка
|
||||
- https://github.com/ath31st/window_calculator
|
||||
- https://github.com/ErdincAltuntas/window-estimator
|
||||
|
||||
### PDF/документи
|
||||
- https://github.com/gotenberg/gotenberg
|
||||
- https://github.com/InvoicePlane/InvoicePlane
|
||||
|
||||
### Пам'ять/RAG
|
||||
- https://github.com/qdrant/qdrant
|
||||
|
||||
## Рекомендований rollout (безпечний)
|
||||
|
||||
1. `Phase 1`: підняти `EspoCRM + Gotenberg + oneok-* adapters`.
|
||||
2. `Phase 2`: увімкнути `router` tools для `oneok` і перевірити E2E (`crm -> calc -> quote -> pdf -> slots`).
|
||||
3. `Phase 3`: стабілізувати політики Telegram/webhook та доступи whitelist.
|
||||
4. `Phase 4`: за потреби замінити `oneok-schedule-adapter` на `Cal.com` без зміни tool-контрактів.
|
||||
|
||||
## Security baseline
|
||||
|
||||
1. Мінімізація PII: не зберігати зайві поля.
|
||||
2. Маскування секретів у логах.
|
||||
3. Service-to-service auth для adapter endpoint-ів.
|
||||
4. Явна позначка `ОЦІНКА` для непідтверджених замірів.
|
||||
Reference in New Issue
Block a user