Files
microdao-daarion/docs/NATS_SUBJECT_MAP.md
Apple ef3473db21 snapshot: NODE1 production state 2026-02-09
Complete snapshot of /opt/microdao-daarion/ from NODE1 (144.76.224.179).
This represents the actual running production code that has diverged
significantly from the previous main branch.

Key changes from old main:
- Gateway (http_api.py): expanded from ~40KB to 164KB with full agent support
- Router: new /v1/agents/{id}/infer endpoint with vision + DeepSeek routing
- Behavior Policy: SOWA v2.2 (3-level: FULL/ACK/SILENT)
- Agent Registry: config/agent_registry.yml as single source of truth
- 13 agents configured (was 3)
- Memory service integration
- CrewAI teams and roles

Excluded from snapshot: venv/, .env, data/, backups, .tgz archives

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-09 08:46:46 -08:00

5.9 KiB

NATS Subject Map — Event Bus Architecture

Cross-Cutting Bus Design

┌─────────────────────────────────────────────────────────────────────────┐
│                          NATS JetStream :4222                           │
│                                                                         │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐│
│  │   MESSAGES   │  │ ATTACHMENTS  │  │  AGENT_RUNS  │  │    AUDIT     ││
│  │   Stream     │  │   Stream     │  │   Stream     │  │   Stream     ││
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘│
│         │                 │                 │                 │        │
└─────────┼─────────────────┼─────────────────┼─────────────────┼────────┘
          │                 │                 │                 │
     ┌────┴────┐       ┌────┴────┐       ┌────┴────┐       ┌────┴────┐
     │ Gateway │       │ Ingest  │       │ Router  │       │   All   │
     │ Parser  │       │ Parser  │       │ CrewAI  │       │Services │
     └─────────┘       └─────────┘       └─────────┘       └─────────┘

Subject Hierarchy

1. Messages (chat/conversation)

message.received.{agent_id}       # Gateway publishes
message.processed.{agent_id}      # Router publishes after LLM
message.sent.{agent_id}           # Gateway confirms delivery

2. Attachments (files/media)

attachment.created.{type}         # Ingest publishes (image/audio/document)
attachment.parsed.{type}          # Parser publishes after extraction
attachment.indexed.{agent_id}     # Memory Service confirms RAG indexing
attachment.failed.{type}          # DLQ for failed processing

3. Agent Runs (workflows/tasks)

agent.run.requested               # Router/Gateway requests task
agent.run.started.{agent_id}      # Worker acknowledges
agent.run.progress.{task_id}      # Worker reports progress
agent.run.completed.{agent_id}    # Worker finished successfully
agent.run.failed.{agent_id}       # Worker failed (→ DLQ)

4. Memory Operations

memory.store.{agent_id}           # Store new memory item
memory.retrieve.{agent_id}        # Retrieve request
memory.indexed.{agent_id}         # Confirmed in vector DB
memory.graph.updated.{agent_id}   # Neo4j graph updated

5. Audit & Ops

audit.action.{service}            # All services log actions
audit.error.{service}             # Error events
ops.health.{service}              # Health heartbeats
ops.alert.{severity}              # critical/warning/info

Stream Configuration

Stream Subjects Retention MaxAge Replicas
MESSAGES message.> limits 7d 1
ATTACHMENTS attachment.> limits 30d 1
AGENT_RUNS agent.run.> limits 7d 1
MEMORY memory.> limits 30d 1
AUDIT audit.>, ops.> limits 90d 1

Dead Letter Queue (DLQ)

Failed messages go to {subject}.dlq:

attachment.failed.dlq             # Failed file processing
agent.run.failed.dlq              # Failed workflow tasks

DLQ consumer should:

  1. Log error details
  2. Alert if count > threshold
  3. Retry with backoff or discard

Consumer Groups

Consumer Stream Filter Purpose
parser-pipeline ATTACHMENTS attachment.created.> Async parsing
crewai-worker AGENT_RUNS agent.run.requested Workflow execution
memory-indexer MEMORY memory.store.> RAG indexing
audit-logger AUDIT audit.> Persistent logging

Event Payload Schema

{
  "event_id": "uuid",
  "event_type": "attachment.created.image",
  "timestamp": "2026-01-19T12:00:00Z",
  "trace_id": "correlation-id",
  "source": "gateway",
  "agent_id": "helion",
  "user_id": "tg:123456",
  "payload": { ... }
}

Publishers & Subscribers Matrix

Service Publishes Subscribes
Gateway message.received, attachment.created message.sent
Router message.processed, agent.run.requested message.received
Ingest attachment.created -
Parser attachment.parsed, attachment.indexed attachment.created
CrewAI Worker agent.run.completed/failed agent.run.requested
Memory Service memory.indexed memory.store
All Services audit.action, ops.health -

Idempotency & Replay

Idempotency Key Required

Every NATS message MUST include idempotency_key:

{
  "event_id": "uuid",
  "idempotency_key": "{source}:{entity_id}:{action}:{timestamp_hour}",
  ...
}

Example: gateway:msg-123:received:2026011912

DLQ Replay Policy

Step Delay Action
Retry 1 1s Auto
Retry 2 5s Auto
Retry 3 30s Auto
DLQ - Alert + Manual review
DLQ + 24h - Auto-retry once
DLQ + 7d - Archive to cold storage

Config/Policy Update Events

config.updated.{key}           # Control Plane publishes
policy.updated.{agent_id}      # Control Plane publishes
prompt.updated.{agent_id}      # Control Plane publishes

Consumers should invalidate cache on these events.