Files
microdao-daarion/docs/NATS_SUBJECT_MAP.md
Apple ef3473db21 snapshot: NODE1 production state 2026-02-09
Complete snapshot of /opt/microdao-daarion/ from NODE1 (144.76.224.179).
This represents the actual running production code that has diverged
significantly from the previous main branch.

Key changes from old main:
- Gateway (http_api.py): expanded from ~40KB to 164KB with full agent support
- Router: new /v1/agents/{id}/infer endpoint with vision + DeepSeek routing
- Behavior Policy: SOWA v2.2 (3-level: FULL/ACK/SILENT)
- Agent Registry: config/agent_registry.yml as single source of truth
- 13 agents configured (was 3)
- Memory service integration
- CrewAI teams and roles

Excluded from snapshot: venv/, .env, data/, backups, .tgz archives

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-09 08:46:46 -08:00

157 lines
5.9 KiB
Markdown

# NATS Subject Map — Event Bus Architecture
## Cross-Cutting Bus Design
```
┌─────────────────────────────────────────────────────────────────────────┐
│ NATS JetStream :4222 │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐│
│ │ MESSAGES │ │ ATTACHMENTS │ │ AGENT_RUNS │ │ AUDIT ││
│ │ Stream │ │ Stream │ │ Stream │ │ Stream ││
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘│
│ │ │ │ │ │
└─────────┼─────────────────┼─────────────────┼─────────────────┼────────┘
│ │ │ │
┌────┴────┐ ┌────┴────┐ ┌────┴────┐ ┌────┴────┐
│ Gateway │ │ Ingest │ │ Router │ │ All │
│ Parser │ │ Parser │ │ CrewAI │ │Services │
└─────────┘ └─────────┘ └─────────┘ └─────────┘
```
## Subject Hierarchy
### 1. Messages (chat/conversation)
```
message.received.{agent_id} # Gateway publishes
message.processed.{agent_id} # Router publishes after LLM
message.sent.{agent_id} # Gateway confirms delivery
```
### 2. Attachments (files/media)
```
attachment.created.{type} # Ingest publishes (image/audio/document)
attachment.parsed.{type} # Parser publishes after extraction
attachment.indexed.{agent_id} # Memory Service confirms RAG indexing
attachment.failed.{type} # DLQ for failed processing
```
### 3. Agent Runs (workflows/tasks)
```
agent.run.requested # Router/Gateway requests task
agent.run.started.{agent_id} # Worker acknowledges
agent.run.progress.{task_id} # Worker reports progress
agent.run.completed.{agent_id} # Worker finished successfully
agent.run.failed.{agent_id} # Worker failed (→ DLQ)
```
### 4. Memory Operations
```
memory.store.{agent_id} # Store new memory item
memory.retrieve.{agent_id} # Retrieve request
memory.indexed.{agent_id} # Confirmed in vector DB
memory.graph.updated.{agent_id} # Neo4j graph updated
```
### 5. Audit & Ops
```
audit.action.{service} # All services log actions
audit.error.{service} # Error events
ops.health.{service} # Health heartbeats
ops.alert.{severity} # critical/warning/info
```
## Stream Configuration
| Stream | Subjects | Retention | MaxAge | Replicas |
|--------|----------|-----------|--------|----------|
| MESSAGES | message.> | limits | 7d | 1 |
| ATTACHMENTS | attachment.> | limits | 30d | 1 |
| AGENT_RUNS | agent.run.> | limits | 7d | 1 |
| MEMORY | memory.> | limits | 30d | 1 |
| AUDIT | audit.>, ops.> | limits | 90d | 1 |
## Dead Letter Queue (DLQ)
Failed messages go to `{subject}.dlq`:
```
attachment.failed.dlq # Failed file processing
agent.run.failed.dlq # Failed workflow tasks
```
DLQ consumer should:
1. Log error details
2. Alert if count > threshold
3. Retry with backoff or discard
## Consumer Groups
| Consumer | Stream | Filter | Purpose |
|----------|--------|--------|---------|
| parser-pipeline | ATTACHMENTS | attachment.created.> | Async parsing |
| crewai-worker | AGENT_RUNS | agent.run.requested | Workflow execution |
| memory-indexer | MEMORY | memory.store.> | RAG indexing |
| audit-logger | AUDIT | audit.> | Persistent logging |
## Event Payload Schema
```json
{
"event_id": "uuid",
"event_type": "attachment.created.image",
"timestamp": "2026-01-19T12:00:00Z",
"trace_id": "correlation-id",
"source": "gateway",
"agent_id": "helion",
"user_id": "tg:123456",
"payload": { ... }
}
```
## Publishers & Subscribers Matrix
| Service | Publishes | Subscribes |
|---------|-----------|------------|
| Gateway | message.received, attachment.created | message.sent |
| Router | message.processed, agent.run.requested | message.received |
| Ingest | attachment.created | - |
| Parser | attachment.parsed, attachment.indexed | attachment.created |
| CrewAI Worker | agent.run.completed/failed | agent.run.requested |
| Memory Service | memory.indexed | memory.store |
| All Services | audit.action, ops.health | - |
---
## Idempotency & Replay
### Idempotency Key Required
Every NATS message MUST include `idempotency_key`:
```json
{
"event_id": "uuid",
"idempotency_key": "{source}:{entity_id}:{action}:{timestamp_hour}",
...
}
```
Example: `gateway:msg-123:received:2026011912`
### DLQ Replay Policy
| Step | Delay | Action |
|------|-------|--------|
| Retry 1 | 1s | Auto |
| Retry 2 | 5s | Auto |
| Retry 3 | 30s | Auto |
| DLQ | - | Alert + Manual review |
| DLQ + 24h | - | Auto-retry once |
| DLQ + 7d | - | Archive to cold storage |
### Config/Policy Update Events
```
config.updated.{key} # Control Plane publishes
policy.updated.{agent_id} # Control Plane publishes
prompt.updated.{agent_id} # Control Plane publishes
```
Consumers should invalidate cache on these events.