snapshot: NODE1 production state 2026-02-09
Complete snapshot of /opt/microdao-daarion/ from NODE1 (144.76.224.179).
This represents the actual running production code that has diverged
significantly from the previous main branch.
Key changes from old main:
- Gateway (http_api.py): expanded from ~40KB to 164KB with full agent support
- Router: new /v1/agents/{id}/infer endpoint with vision + DeepSeek routing
- Behavior Policy: SOWA v2.2 (3-level: FULL/ACK/SILENT)
- Agent Registry: config/agent_registry.yml as single source of truth
- 13 agents configured (was 3)
- Memory service integration
- CrewAI teams and roles
Excluded from snapshot: venv/, .env, data/, backups, .tgz archives
Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
156
docs/NATS_SUBJECT_MAP.md
Normal file
156
docs/NATS_SUBJECT_MAP.md
Normal file
@@ -0,0 +1,156 @@
|
||||
# NATS Subject Map — Event Bus Architecture
|
||||
|
||||
## Cross-Cutting Bus Design
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────┐
|
||||
│ NATS JetStream :4222 │
|
||||
│ │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐│
|
||||
│ │ MESSAGES │ │ ATTACHMENTS │ │ AGENT_RUNS │ │ AUDIT ││
|
||||
│ │ Stream │ │ Stream │ │ Stream │ │ Stream ││
|
||||
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘│
|
||||
│ │ │ │ │ │
|
||||
└─────────┼─────────────────┼─────────────────┼─────────────────┼────────┘
|
||||
│ │ │ │
|
||||
┌────┴────┐ ┌────┴────┐ ┌────┴────┐ ┌────┴────┐
|
||||
│ Gateway │ │ Ingest │ │ Router │ │ All │
|
||||
│ Parser │ │ Parser │ │ CrewAI │ │Services │
|
||||
└─────────┘ └─────────┘ └─────────┘ └─────────┘
|
||||
```
|
||||
|
||||
## Subject Hierarchy
|
||||
|
||||
### 1. Messages (chat/conversation)
|
||||
```
|
||||
message.received.{agent_id} # Gateway publishes
|
||||
message.processed.{agent_id} # Router publishes after LLM
|
||||
message.sent.{agent_id} # Gateway confirms delivery
|
||||
```
|
||||
|
||||
### 2. Attachments (files/media)
|
||||
```
|
||||
attachment.created.{type} # Ingest publishes (image/audio/document)
|
||||
attachment.parsed.{type} # Parser publishes after extraction
|
||||
attachment.indexed.{agent_id} # Memory Service confirms RAG indexing
|
||||
attachment.failed.{type} # DLQ for failed processing
|
||||
```
|
||||
|
||||
### 3. Agent Runs (workflows/tasks)
|
||||
```
|
||||
agent.run.requested # Router/Gateway requests task
|
||||
agent.run.started.{agent_id} # Worker acknowledges
|
||||
agent.run.progress.{task_id} # Worker reports progress
|
||||
agent.run.completed.{agent_id} # Worker finished successfully
|
||||
agent.run.failed.{agent_id} # Worker failed (→ DLQ)
|
||||
```
|
||||
|
||||
### 4. Memory Operations
|
||||
```
|
||||
memory.store.{agent_id} # Store new memory item
|
||||
memory.retrieve.{agent_id} # Retrieve request
|
||||
memory.indexed.{agent_id} # Confirmed in vector DB
|
||||
memory.graph.updated.{agent_id} # Neo4j graph updated
|
||||
```
|
||||
|
||||
### 5. Audit & Ops
|
||||
```
|
||||
audit.action.{service} # All services log actions
|
||||
audit.error.{service} # Error events
|
||||
ops.health.{service} # Health heartbeats
|
||||
ops.alert.{severity} # critical/warning/info
|
||||
```
|
||||
|
||||
## Stream Configuration
|
||||
|
||||
| Stream | Subjects | Retention | MaxAge | Replicas |
|
||||
|--------|----------|-----------|--------|----------|
|
||||
| MESSAGES | message.> | limits | 7d | 1 |
|
||||
| ATTACHMENTS | attachment.> | limits | 30d | 1 |
|
||||
| AGENT_RUNS | agent.run.> | limits | 7d | 1 |
|
||||
| MEMORY | memory.> | limits | 30d | 1 |
|
||||
| AUDIT | audit.>, ops.> | limits | 90d | 1 |
|
||||
|
||||
## Dead Letter Queue (DLQ)
|
||||
|
||||
Failed messages go to `{subject}.dlq`:
|
||||
```
|
||||
attachment.failed.dlq # Failed file processing
|
||||
agent.run.failed.dlq # Failed workflow tasks
|
||||
```
|
||||
|
||||
DLQ consumer should:
|
||||
1. Log error details
|
||||
2. Alert if count > threshold
|
||||
3. Retry with backoff or discard
|
||||
|
||||
## Consumer Groups
|
||||
|
||||
| Consumer | Stream | Filter | Purpose |
|
||||
|----------|--------|--------|---------|
|
||||
| parser-pipeline | ATTACHMENTS | attachment.created.> | Async parsing |
|
||||
| crewai-worker | AGENT_RUNS | agent.run.requested | Workflow execution |
|
||||
| memory-indexer | MEMORY | memory.store.> | RAG indexing |
|
||||
| audit-logger | AUDIT | audit.> | Persistent logging |
|
||||
|
||||
## Event Payload Schema
|
||||
|
||||
```json
|
||||
{
|
||||
"event_id": "uuid",
|
||||
"event_type": "attachment.created.image",
|
||||
"timestamp": "2026-01-19T12:00:00Z",
|
||||
"trace_id": "correlation-id",
|
||||
"source": "gateway",
|
||||
"agent_id": "helion",
|
||||
"user_id": "tg:123456",
|
||||
"payload": { ... }
|
||||
}
|
||||
```
|
||||
|
||||
## Publishers & Subscribers Matrix
|
||||
|
||||
| Service | Publishes | Subscribes |
|
||||
|---------|-----------|------------|
|
||||
| Gateway | message.received, attachment.created | message.sent |
|
||||
| Router | message.processed, agent.run.requested | message.received |
|
||||
| Ingest | attachment.created | - |
|
||||
| Parser | attachment.parsed, attachment.indexed | attachment.created |
|
||||
| CrewAI Worker | agent.run.completed/failed | agent.run.requested |
|
||||
| Memory Service | memory.indexed | memory.store |
|
||||
| All Services | audit.action, ops.health | - |
|
||||
|
||||
---
|
||||
|
||||
## Idempotency & Replay
|
||||
|
||||
### Idempotency Key Required
|
||||
Every NATS message MUST include `idempotency_key`:
|
||||
```json
|
||||
{
|
||||
"event_id": "uuid",
|
||||
"idempotency_key": "{source}:{entity_id}:{action}:{timestamp_hour}",
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
Example: `gateway:msg-123:received:2026011912`
|
||||
|
||||
### DLQ Replay Policy
|
||||
| Step | Delay | Action |
|
||||
|------|-------|--------|
|
||||
| Retry 1 | 1s | Auto |
|
||||
| Retry 2 | 5s | Auto |
|
||||
| Retry 3 | 30s | Auto |
|
||||
| DLQ | - | Alert + Manual review |
|
||||
| DLQ + 24h | - | Auto-retry once |
|
||||
| DLQ + 7d | - | Archive to cold storage |
|
||||
|
||||
### Config/Policy Update Events
|
||||
```
|
||||
config.updated.{key} # Control Plane publishes
|
||||
policy.updated.{agent_id} # Control Plane publishes
|
||||
prompt.updated.{agent_id} # Control Plane publishes
|
||||
```
|
||||
|
||||
Consumers should invalidate cache on these events.
|
||||
Reference in New Issue
Block a user