snapshot: NODE1 production state 2026-02-09

Complete snapshot of /opt/microdao-daarion/ from NODE1 (144.76.224.179).
This represents the actual running production code that has diverged
significantly from the previous main branch.

Key changes from old main:
- Gateway (http_api.py): expanded from ~40KB to 164KB with full agent support
- Router: new /v1/agents/{id}/infer endpoint with vision + DeepSeek routing
- Behavior Policy: SOWA v2.2 (3-level: FULL/ACK/SILENT)
- Agent Registry: config/agent_registry.yml as single source of truth
- 13 agents configured (was 3)
- Memory service integration
- CrewAI teams and roles

Excluded from snapshot: venv/, .env, data/, backups, .tgz archives

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Apple
2026-02-09 08:46:46 -08:00
parent 134c044c21
commit ef3473db21
9473 changed files with 408933 additions and 2769877 deletions

29
docs/CHAOS_TEST_REPORT.md Normal file
View File

@@ -0,0 +1,29 @@
# Chaos Test Report
| Test | Start/End (UTC) | Max Lag | DLQ Peak | p95 Latency | Unique Success | Notes |
|---|---|---|---|---|---|---|
## Baseline (staging)
- Time (UTC): 2026-01-19 15:49:15
- Streams: reported lag=0 during tests (jsz ok)
| A Kill Worker | 2026-01-19 15:49:12Z15:51:20Z | 0 | 0 | 39374.01 ms | 100% (50/60 unique) | restart crewai-worker |
| B Kill Router | 2026-01-19 15:51:21Z15:53:29Z | 0 | 0 | 6174.56 ms | 100% (50/60 unique) | restart router |
| C Block Postgres | 2026-01-19 15:53:30Z15:55:38Z | 0 | 0 | 6207.92 ms | 100% (50/60 unique) | stop/start postgres 60s |
| D DLQ Replay | 2026-01-19 16:08:30Z16:09:10Z | n/a | 1→0 | n/a | completed | forced fail + replay, job_id=dlq-test-1768839356 |
| A Kill Worker | 2026-01-19 16:56:53 2026-01-19 16:59:01 | 0 | 0 | 57974.01 | (100.00% | restart crewai-worker |
| B Kill Router | 2026-01-19 16:59:01 2026-01-19 17:01:10 | 0 | 0 | 6183.63 | (100.00% | restart router |
| C Block Postgres | 2026-01-19 17:01:10 2026-01-19 17:03:19 | 0 | 0 | 6206.32 | (100.00% | stop/start postgres 60s |
| D DLQ Replay | 2026-01-19 17:03:19 2026-01-19 17:03:32 | n/a | see log | n/a | n/a | dlq_replay.py |
| A Kill Worker | 2026-01-19 17:04:15 2026-01-19 17:06:24 | 0 | 0 | 76807.84 | (100.00% | restart crewai-worker |
| B Kill Router | 2026-01-19 17:06:24 2026-01-19 17:08:33 | 0 | 0 | 6171.86 | (100.00% | restart router |
| C Block Postgres | 2026-01-19 17:08:33 2026-01-19 17:10:41 | 0 | 0 | 6210.77 | (100.00% | stop/start postgres 60s |
| D DLQ Replay | 2026-01-19 17:10:41 2026-01-19 17:10:54 | n/a | see log | n/a | n/a | dlq_replay.py |
| A Kill Worker | 2026-01-19 17:13:25 2026-01-19 17:15:34 | 0 | 0 | 96020.54 | (100.00% | restart crewai-worker |
| B Kill Router | 2026-01-19 17:15:34 2026-01-19 17:17:43 | 0 | 0 | 6169.57 | (100.00% | restart router |
| C Block Postgres | 2026-01-19 17:17:43 2026-01-19 17:19:51 | 0 | 0 | 6212.49 | (100.00% | stop/start postgres 60s |
| D DLQ Replay | 2026-01-19 17:19:51 2026-01-19 17:20:08 | n/a | see log | n/a | completed | forced fail + replay, job_id=dlq-test-1768838617, subject=agent.run.completed.helion, replay_count=n/a |
| A Kill Worker | 2026-01-19 17:20:51 2026-01-19 17:23:00 | 0 | 0 | 115620.04 | (100.00% | restart crewai-worker |
| B Kill Router | 2026-01-19 17:23:00 2026-01-19 17:25:08 | 0 | 0 | 6175.69 | (100.00% | restart router |
| C Block Postgres | 2026-01-19 17:25:08 2026-01-19 17:27:17 | 0 | 0 | 5950.39 | (100.00% | stop/start postgres 60s |
| D DLQ Replay | 2026-01-19 17:27:17 2026-01-19 17:27:34 | n/a | see log | n/a | completed | forced fail + replay, job_id=dlq-test-1768838617, subject=agent.run.completed.helion, replay_count=n/a |