Complete snapshot of /opt/microdao-daarion/ from NODE1 (144.76.224.179).
This represents the actual running production code that has diverged
significantly from the previous main branch.
Key changes from old main:
- Gateway (http_api.py): expanded from ~40KB to 164KB with full agent support
- Router: new /v1/agents/{id}/infer endpoint with vision + DeepSeek routing
- Behavior Policy: SOWA v2.2 (3-level: FULL/ACK/SILENT)
- Agent Registry: config/agent_registry.yml as single source of truth
- 13 agents configured (was 3)
- Memory service integration
- CrewAI teams and roles
Excluded from snapshot: venv/, .env, data/, backups, .tgz archives
Co-authored-by: Cursor <cursoragent@cursor.com>
157 lines
5.9 KiB
Markdown
157 lines
5.9 KiB
Markdown
# NATS Subject Map — Event Bus Architecture
|
|
|
|
## Cross-Cutting Bus Design
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────────────┐
|
|
│ NATS JetStream :4222 │
|
|
│ │
|
|
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐│
|
|
│ │ MESSAGES │ │ ATTACHMENTS │ │ AGENT_RUNS │ │ AUDIT ││
|
|
│ │ Stream │ │ Stream │ │ Stream │ │ Stream ││
|
|
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘│
|
|
│ │ │ │ │ │
|
|
└─────────┼─────────────────┼─────────────────┼─────────────────┼────────┘
|
|
│ │ │ │
|
|
┌────┴────┐ ┌────┴────┐ ┌────┴────┐ ┌────┴────┐
|
|
│ Gateway │ │ Ingest │ │ Router │ │ All │
|
|
│ Parser │ │ Parser │ │ CrewAI │ │Services │
|
|
└─────────┘ └─────────┘ └─────────┘ └─────────┘
|
|
```
|
|
|
|
## Subject Hierarchy
|
|
|
|
### 1. Messages (chat/conversation)
|
|
```
|
|
message.received.{agent_id} # Gateway publishes
|
|
message.processed.{agent_id} # Router publishes after LLM
|
|
message.sent.{agent_id} # Gateway confirms delivery
|
|
```
|
|
|
|
### 2. Attachments (files/media)
|
|
```
|
|
attachment.created.{type} # Ingest publishes (image/audio/document)
|
|
attachment.parsed.{type} # Parser publishes after extraction
|
|
attachment.indexed.{agent_id} # Memory Service confirms RAG indexing
|
|
attachment.failed.{type} # DLQ for failed processing
|
|
```
|
|
|
|
### 3. Agent Runs (workflows/tasks)
|
|
```
|
|
agent.run.requested # Router/Gateway requests task
|
|
agent.run.started.{agent_id} # Worker acknowledges
|
|
agent.run.progress.{task_id} # Worker reports progress
|
|
agent.run.completed.{agent_id} # Worker finished successfully
|
|
agent.run.failed.{agent_id} # Worker failed (→ DLQ)
|
|
```
|
|
|
|
### 4. Memory Operations
|
|
```
|
|
memory.store.{agent_id} # Store new memory item
|
|
memory.retrieve.{agent_id} # Retrieve request
|
|
memory.indexed.{agent_id} # Confirmed in vector DB
|
|
memory.graph.updated.{agent_id} # Neo4j graph updated
|
|
```
|
|
|
|
### 5. Audit & Ops
|
|
```
|
|
audit.action.{service} # All services log actions
|
|
audit.error.{service} # Error events
|
|
ops.health.{service} # Health heartbeats
|
|
ops.alert.{severity} # critical/warning/info
|
|
```
|
|
|
|
## Stream Configuration
|
|
|
|
| Stream | Subjects | Retention | MaxAge | Replicas |
|
|
|--------|----------|-----------|--------|----------|
|
|
| MESSAGES | message.> | limits | 7d | 1 |
|
|
| ATTACHMENTS | attachment.> | limits | 30d | 1 |
|
|
| AGENT_RUNS | agent.run.> | limits | 7d | 1 |
|
|
| MEMORY | memory.> | limits | 30d | 1 |
|
|
| AUDIT | audit.>, ops.> | limits | 90d | 1 |
|
|
|
|
## Dead Letter Queue (DLQ)
|
|
|
|
Failed messages go to `{subject}.dlq`:
|
|
```
|
|
attachment.failed.dlq # Failed file processing
|
|
agent.run.failed.dlq # Failed workflow tasks
|
|
```
|
|
|
|
DLQ consumer should:
|
|
1. Log error details
|
|
2. Alert if count > threshold
|
|
3. Retry with backoff or discard
|
|
|
|
## Consumer Groups
|
|
|
|
| Consumer | Stream | Filter | Purpose |
|
|
|----------|--------|--------|---------|
|
|
| parser-pipeline | ATTACHMENTS | attachment.created.> | Async parsing |
|
|
| crewai-worker | AGENT_RUNS | agent.run.requested | Workflow execution |
|
|
| memory-indexer | MEMORY | memory.store.> | RAG indexing |
|
|
| audit-logger | AUDIT | audit.> | Persistent logging |
|
|
|
|
## Event Payload Schema
|
|
|
|
```json
|
|
{
|
|
"event_id": "uuid",
|
|
"event_type": "attachment.created.image",
|
|
"timestamp": "2026-01-19T12:00:00Z",
|
|
"trace_id": "correlation-id",
|
|
"source": "gateway",
|
|
"agent_id": "helion",
|
|
"user_id": "tg:123456",
|
|
"payload": { ... }
|
|
}
|
|
```
|
|
|
|
## Publishers & Subscribers Matrix
|
|
|
|
| Service | Publishes | Subscribes |
|
|
|---------|-----------|------------|
|
|
| Gateway | message.received, attachment.created | message.sent |
|
|
| Router | message.processed, agent.run.requested | message.received |
|
|
| Ingest | attachment.created | - |
|
|
| Parser | attachment.parsed, attachment.indexed | attachment.created |
|
|
| CrewAI Worker | agent.run.completed/failed | agent.run.requested |
|
|
| Memory Service | memory.indexed | memory.store |
|
|
| All Services | audit.action, ops.health | - |
|
|
|
|
---
|
|
|
|
## Idempotency & Replay
|
|
|
|
### Idempotency Key Required
|
|
Every NATS message MUST include `idempotency_key`:
|
|
```json
|
|
{
|
|
"event_id": "uuid",
|
|
"idempotency_key": "{source}:{entity_id}:{action}:{timestamp_hour}",
|
|
...
|
|
}
|
|
```
|
|
|
|
Example: `gateway:msg-123:received:2026011912`
|
|
|
|
### DLQ Replay Policy
|
|
| Step | Delay | Action |
|
|
|------|-------|--------|
|
|
| Retry 1 | 1s | Auto |
|
|
| Retry 2 | 5s | Auto |
|
|
| Retry 3 | 30s | Auto |
|
|
| DLQ | - | Alert + Manual review |
|
|
| DLQ + 24h | - | Auto-retry once |
|
|
| DLQ + 7d | - | Archive to cold storage |
|
|
|
|
### Config/Policy Update Events
|
|
```
|
|
config.updated.{key} # Control Plane publishes
|
|
policy.updated.{agent_id} # Control Plane publishes
|
|
prompt.updated.{agent_id} # Control Plane publishes
|
|
```
|
|
|
|
Consumers should invalidate cache on these events.
|