Files

Apple ef3473db21 snapshot: NODE1 production state 2026-02-09

Complete snapshot of /opt/microdao-daarion/ from NODE1 (144.76.224.179).
This represents the actual running production code that has diverged
significantly from the previous main branch.

Key changes from old main:
- Gateway (http_api.py): expanded from ~40KB to 164KB with full agent support
- Router: new /v1/agents/{id}/infer endpoint with vision + DeepSeek routing
- Behavior Policy: SOWA v2.2 (3-level: FULL/ACK/SILENT)
- Agent Registry: config/agent_registry.yml as single source of truth
- 13 agents configured (was 3)
- Memory service integration
- CrewAI teams and roles

Excluded from snapshot: venv/, .env, data/, backups, .tgz archives

Co-authored-by: Cursor <cursoragent@cursor.com>

2026-02-09 08:46:46 -08:00

5.9 KiB

Raw Blame History

NATS Subject Map — Event Bus Architecture

Cross-Cutting Bus Design

┌─────────────────────────────────────────────────────────────────────────┐
│                          NATS JetStream :4222                           │
│                                                                         │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐│
│  │   MESSAGES   │  │ ATTACHMENTS  │  │  AGENT_RUNS  │  │    AUDIT     ││
│  │   Stream     │  │   Stream     │  │   Stream     │  │   Stream     ││
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘│
│         │                 │                 │                 │        │
└─────────┼─────────────────┼─────────────────┼─────────────────┼────────┘
          │                 │                 │                 │
     ┌────┴────┐       ┌────┴────┐       ┌────┴────┐       ┌────┴────┐
     │ Gateway │       │ Ingest  │       │ Router  │       │   All   │
     │ Parser  │       │ Parser  │       │ CrewAI  │       │Services │
     └─────────┘       └─────────┘       └─────────┘       └─────────┘

Subject Hierarchy

1. Messages (chat/conversation)

message.received.{agent_id}       # Gateway publishes
message.processed.{agent_id}      # Router publishes after LLM
message.sent.{agent_id}           # Gateway confirms delivery

2. Attachments (files/media)

attachment.created.{type}         # Ingest publishes (image/audio/document)
attachment.parsed.{type}          # Parser publishes after extraction
attachment.indexed.{agent_id}     # Memory Service confirms RAG indexing
attachment.failed.{type}          # DLQ for failed processing

3. Agent Runs (workflows/tasks)

agent.run.requested               # Router/Gateway requests task
agent.run.started.{agent_id}      # Worker acknowledges
agent.run.progress.{task_id}      # Worker reports progress
agent.run.completed.{agent_id}    # Worker finished successfully
agent.run.failed.{agent_id}       # Worker failed (→ DLQ)

4. Memory Operations

memory.store.{agent_id}           # Store new memory item
memory.retrieve.{agent_id}        # Retrieve request
memory.indexed.{agent_id}         # Confirmed in vector DB
memory.graph.updated.{agent_id}   # Neo4j graph updated

5. Audit & Ops

audit.action.{service}            # All services log actions
audit.error.{service}             # Error events
ops.health.{service}              # Health heartbeats
ops.alert.{severity}              # critical/warning/info

Stream Configuration

Stream	Subjects	Retention	MaxAge	Replicas
MESSAGES	message.>	limits	7d	1
ATTACHMENTS	attachment.>	limits	30d	1
AGENT_RUNS	agent.run.>	limits	7d	1
MEMORY	memory.>	limits	30d	1
AUDIT	audit.>, ops.>	limits	90d	1

Dead Letter Queue (DLQ)

Failed messages go to {subject}.dlq:

attachment.failed.dlq             # Failed file processing
agent.run.failed.dlq              # Failed workflow tasks

DLQ consumer should:

Log error details
Alert if count > threshold
Retry with backoff or discard

Consumer Groups

Consumer	Stream	Filter	Purpose
parser-pipeline	ATTACHMENTS	attachment.created.>	Async parsing
crewai-worker	AGENT_RUNS	agent.run.requested	Workflow execution
memory-indexer	MEMORY	memory.store.>	RAG indexing
audit-logger	AUDIT	audit.>	Persistent logging

Event Payload Schema

{
  "event_id": "uuid",
  "event_type": "attachment.created.image",
  "timestamp": "2026-01-19T12:00:00Z",
  "trace_id": "correlation-id",
  "source": "gateway",
  "agent_id": "helion",
  "user_id": "tg:123456",
  "payload": { ... }
}

Publishers & Subscribers Matrix

Service	Publishes	Subscribes
Gateway	message.received, attachment.created	message.sent
Router	message.processed, agent.run.requested	message.received
Ingest	attachment.created	-
Parser	attachment.parsed, attachment.indexed	attachment.created
CrewAI Worker	agent.run.completed/failed	agent.run.requested
Memory Service	memory.indexed	memory.store
All Services	audit.action, ops.health	-

Idempotency & Replay

Idempotency Key Required

Every NATS message MUST include idempotency_key:

{
  "event_id": "uuid",
  "idempotency_key": "{source}:{entity_id}:{action}:{timestamp_hour}",
  ...
}

Example: gateway:msg-123:received:2026011912

DLQ Replay Policy

Step	Delay	Action
Retry 1	1s	Auto
Retry 2	5s	Auto
Retry 3	30s	Auto
DLQ	-	Alert + Manual review
DLQ + 24h	-	Auto-retry once
DLQ + 7d	-	Archive to cold storage

Config/Policy Update Events

config.updated.{key}           # Control Plane publishes
policy.updated.{agent_id}      # Control Plane publishes
prompt.updated.{agent_id}      # Control Plane publishes

Consumers should invalidate cache on these events.

5.9 KiB Raw Blame History