Complete snapshot of /opt/microdao-daarion/ from NODE1 (144.76.224.179).
This represents the actual running production code that has diverged
significantly from the previous main branch.
Key changes from old main:
- Gateway (http_api.py): expanded from ~40KB to 164KB with full agent support
- Router: new /v1/agents/{id}/infer endpoint with vision + DeepSeek routing
- Behavior Policy: SOWA v2.2 (3-level: FULL/ACK/SILENT)
- Agent Registry: config/agent_registry.yml as single source of truth
- 13 agents configured (was 3)
- Memory service integration
- CrewAI teams and roles
Excluded from snapshot: venv/, .env, data/, backups, .tgz archives
Co-authored-by: Cursor <cursoragent@cursor.com>
5.9 KiB
5.9 KiB
NATS Subject Map — Event Bus Architecture
Cross-Cutting Bus Design
┌─────────────────────────────────────────────────────────────────────────┐
│ NATS JetStream :4222 │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐│
│ │ MESSAGES │ │ ATTACHMENTS │ │ AGENT_RUNS │ │ AUDIT ││
│ │ Stream │ │ Stream │ │ Stream │ │ Stream ││
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘│
│ │ │ │ │ │
└─────────┼─────────────────┼─────────────────┼─────────────────┼────────┘
│ │ │ │
┌────┴────┐ ┌────┴────┐ ┌────┴────┐ ┌────┴────┐
│ Gateway │ │ Ingest │ │ Router │ │ All │
│ Parser │ │ Parser │ │ CrewAI │ │Services │
└─────────┘ └─────────┘ └─────────┘ └─────────┘
Subject Hierarchy
1. Messages (chat/conversation)
message.received.{agent_id} # Gateway publishes
message.processed.{agent_id} # Router publishes after LLM
message.sent.{agent_id} # Gateway confirms delivery
2. Attachments (files/media)
attachment.created.{type} # Ingest publishes (image/audio/document)
attachment.parsed.{type} # Parser publishes after extraction
attachment.indexed.{agent_id} # Memory Service confirms RAG indexing
attachment.failed.{type} # DLQ for failed processing
3. Agent Runs (workflows/tasks)
agent.run.requested # Router/Gateway requests task
agent.run.started.{agent_id} # Worker acknowledges
agent.run.progress.{task_id} # Worker reports progress
agent.run.completed.{agent_id} # Worker finished successfully
agent.run.failed.{agent_id} # Worker failed (→ DLQ)
4. Memory Operations
memory.store.{agent_id} # Store new memory item
memory.retrieve.{agent_id} # Retrieve request
memory.indexed.{agent_id} # Confirmed in vector DB
memory.graph.updated.{agent_id} # Neo4j graph updated
5. Audit & Ops
audit.action.{service} # All services log actions
audit.error.{service} # Error events
ops.health.{service} # Health heartbeats
ops.alert.{severity} # critical/warning/info
Stream Configuration
| Stream | Subjects | Retention | MaxAge | Replicas |
|---|---|---|---|---|
| MESSAGES | message.> | limits | 7d | 1 |
| ATTACHMENTS | attachment.> | limits | 30d | 1 |
| AGENT_RUNS | agent.run.> | limits | 7d | 1 |
| MEMORY | memory.> | limits | 30d | 1 |
| AUDIT | audit.>, ops.> | limits | 90d | 1 |
Dead Letter Queue (DLQ)
Failed messages go to {subject}.dlq:
attachment.failed.dlq # Failed file processing
agent.run.failed.dlq # Failed workflow tasks
DLQ consumer should:
- Log error details
- Alert if count > threshold
- Retry with backoff or discard
Consumer Groups
| Consumer | Stream | Filter | Purpose |
|---|---|---|---|
| parser-pipeline | ATTACHMENTS | attachment.created.> | Async parsing |
| crewai-worker | AGENT_RUNS | agent.run.requested | Workflow execution |
| memory-indexer | MEMORY | memory.store.> | RAG indexing |
| audit-logger | AUDIT | audit.> | Persistent logging |
Event Payload Schema
{
"event_id": "uuid",
"event_type": "attachment.created.image",
"timestamp": "2026-01-19T12:00:00Z",
"trace_id": "correlation-id",
"source": "gateway",
"agent_id": "helion",
"user_id": "tg:123456",
"payload": { ... }
}
Publishers & Subscribers Matrix
| Service | Publishes | Subscribes |
|---|---|---|
| Gateway | message.received, attachment.created | message.sent |
| Router | message.processed, agent.run.requested | message.received |
| Ingest | attachment.created | - |
| Parser | attachment.parsed, attachment.indexed | attachment.created |
| CrewAI Worker | agent.run.completed/failed | agent.run.requested |
| Memory Service | memory.indexed | memory.store |
| All Services | audit.action, ops.health | - |
Idempotency & Replay
Idempotency Key Required
Every NATS message MUST include idempotency_key:
{
"event_id": "uuid",
"idempotency_key": "{source}:{entity_id}:{action}:{timestamp_hour}",
...
}
Example: gateway:msg-123:received:2026011912
DLQ Replay Policy
| Step | Delay | Action |
|---|---|---|
| Retry 1 | 1s | Auto |
| Retry 2 | 5s | Auto |
| Retry 3 | 30s | Auto |
| DLQ | - | Alert + Manual review |
| DLQ + 24h | - | Auto-retry once |
| DLQ + 7d | - | Archive to cold storage |
Config/Policy Update Events
config.updated.{key} # Control Plane publishes
policy.updated.{agent_id} # Control Plane publishes
prompt.updated.{agent_id} # Control Plane publishes
Consumers should invalidate cache on these events.