# NATS Subject Map — Event Bus Architecture ## Cross-Cutting Bus Design ``` ┌─────────────────────────────────────────────────────────────────────────┐ │ NATS JetStream :4222 │ │ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐│ │ │ MESSAGES │ │ ATTACHMENTS │ │ AGENT_RUNS │ │ AUDIT ││ │ │ Stream │ │ Stream │ │ Stream │ │ Stream ││ │ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘│ │ │ │ │ │ │ └─────────┼─────────────────┼─────────────────┼─────────────────┼────────┘ │ │ │ │ ┌────┴────┐ ┌────┴────┐ ┌────┴────┐ ┌────┴────┐ │ Gateway │ │ Ingest │ │ Router │ │ All │ │ Parser │ │ Parser │ │ CrewAI │ │Services │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ ``` ## Subject Hierarchy ### 1. Messages (chat/conversation) ``` message.received.{agent_id} # Gateway publishes message.processed.{agent_id} # Router publishes after LLM message.sent.{agent_id} # Gateway confirms delivery ``` ### 2. Attachments (files/media) ``` attachment.created.{type} # Ingest publishes (image/audio/document) attachment.parsed.{type} # Parser publishes after extraction attachment.indexed.{agent_id} # Memory Service confirms RAG indexing attachment.failed.{type} # DLQ for failed processing ``` ### 3. Agent Runs (workflows/tasks) ``` agent.run.requested # Router/Gateway requests task agent.run.started.{agent_id} # Worker acknowledges agent.run.progress.{task_id} # Worker reports progress agent.run.completed.{agent_id} # Worker finished successfully agent.run.failed.{agent_id} # Worker failed (→ DLQ) ``` ### 4. Memory Operations ``` memory.store.{agent_id} # Store new memory item memory.retrieve.{agent_id} # Retrieve request memory.indexed.{agent_id} # Confirmed in vector DB memory.graph.updated.{agent_id} # Neo4j graph updated ``` ### 5. Audit & Ops ``` audit.action.{service} # All services log actions audit.error.{service} # Error events ops.health.{service} # Health heartbeats ops.alert.{severity} # critical/warning/info ``` ## Stream Configuration | Stream | Subjects | Retention | MaxAge | Replicas | |--------|----------|-----------|--------|----------| | MESSAGES | message.> | limits | 7d | 1 | | ATTACHMENTS | attachment.> | limits | 30d | 1 | | AGENT_RUNS | agent.run.> | limits | 7d | 1 | | MEMORY | memory.> | limits | 30d | 1 | | AUDIT | audit.>, ops.> | limits | 90d | 1 | ## Dead Letter Queue (DLQ) Failed messages go to `{subject}.dlq`: ``` attachment.failed.dlq # Failed file processing agent.run.failed.dlq # Failed workflow tasks ``` DLQ consumer should: 1. Log error details 2. Alert if count > threshold 3. Retry with backoff or discard ## Consumer Groups | Consumer | Stream | Filter | Purpose | |----------|--------|--------|---------| | parser-pipeline | ATTACHMENTS | attachment.created.> | Async parsing | | crewai-worker | AGENT_RUNS | agent.run.requested | Workflow execution | | memory-indexer | MEMORY | memory.store.> | RAG indexing | | audit-logger | AUDIT | audit.> | Persistent logging | ## Event Payload Schema ```json { "event_id": "uuid", "event_type": "attachment.created.image", "timestamp": "2026-01-19T12:00:00Z", "trace_id": "correlation-id", "source": "gateway", "agent_id": "helion", "user_id": "tg:123456", "payload": { ... } } ``` ## Publishers & Subscribers Matrix | Service | Publishes | Subscribes | |---------|-----------|------------| | Gateway | message.received, attachment.created | message.sent | | Router | message.processed, agent.run.requested | message.received | | Ingest | attachment.created | - | | Parser | attachment.parsed, attachment.indexed | attachment.created | | CrewAI Worker | agent.run.completed/failed | agent.run.requested | | Memory Service | memory.indexed | memory.store | | All Services | audit.action, ops.health | - | --- ## Idempotency & Replay ### Idempotency Key Required Every NATS message MUST include `idempotency_key`: ```json { "event_id": "uuid", "idempotency_key": "{source}:{entity_id}:{action}:{timestamp_hour}", ... } ``` Example: `gateway:msg-123:received:2026011912` ### DLQ Replay Policy | Step | Delay | Action | |------|-------|--------| | Retry 1 | 1s | Auto | | Retry 2 | 5s | Auto | | Retry 3 | 30s | Auto | | DLQ | - | Alert + Manual review | | DLQ + 24h | - | Auto-retry once | | DLQ + 7d | - | Archive to cold storage | ### Config/Policy Update Events ``` config.updated.{key} # Control Plane publishes policy.updated.{agent_id} # Control Plane publishes prompt.updated.{agent_id} # Control Plane publishes ``` Consumers should invalidate cache on these events.