Complete snapshot of /opt/microdao-daarion/ from NODE1 (144.76.224.179).
This represents the actual running production code that has diverged
significantly from the previous main branch.
Key changes from old main:
- Gateway (http_api.py): expanded from ~40KB to 164KB with full agent support
- Router: new /v1/agents/{id}/infer endpoint with vision + DeepSeek routing
- Behavior Policy: SOWA v2.2 (3-level: FULL/ACK/SILENT)
- Agent Registry: config/agent_registry.yml as single source of truth
- 13 agents configured (was 3)
- Memory service integration
- CrewAI teams and roles
Excluded from snapshot: venv/, .env, data/, backups, .tgz archives
Co-authored-by: Cursor <cursoragent@cursor.com>
3.3 KiB
ADR: Service Boundaries & Contract Ownership
Status: Accepted
Date: 2026-01-19
Authors: DAARION Platform Team
1. Service Boundaries
Gateway (BFF) :9300
Does: Telegram webhooks, Auth, Rate limiting, Request normalization, Trace ID generation
Does NOT: LLM calls, Direct DB access, Business logic
Router :9102
Does: Agent routing, Tool orchestration, Policy enforcement, LLM provider selection
Does NOT: Session storage, Direct DB access (tech debt: graph_query), File processing
Control Plane :9200
Does: Versioned prompts, Policy/RBAC, Config/flags, Quotas
Does NOT: Request processing, Data storage
Memory API :8000
Does: Vector search, Graph queries, Fact storage, ACL enforcement, Audit
Does NOT: LLM calls, File processing
Swapper :8890
Does: Vision, STT, TTS, Image generation (lazy load)
Does NOT: Data storage, Policy decisions
CrewAI Worker :9011
Does: Async workflows, Multi-agent, NATS consumption
Does NOT: Sync requests, Direct LLM calls
2. NATS Subject Taxonomy
Naming: {domain}.{action}.{entity}[.{subtype}]
| Subject | Publisher | Consumer | Idempotency Key |
|---|---|---|---|
| message.received.{agent_id} | Gateway | Router | request_id |
| message.processed.{agent_id} | Router | Gateway | request_id |
| attachment.created.{type} | Ingest | Parser | file_id |
| attachment.parsed.{type} | Parser | Memory | file_id |
| agent.run.requested | Router | Worker | job_id |
| agent.run.completed | Worker | Gateway | job_id |
| audit.action.{service} | All | Audit | event_id |
DLQ Policy
| Stream | Max Retries | DLQ | Action |
|---|---|---|---|
| ATTACHMENTS | 3 | attachment.failed.dlq | Manual review |
| AGENT_RUNS | 3 | agent.run.failed.dlq | Alert + retry |
| MEMORY | 5 | memory.failed.dlq | Auto-retry 1h |
3. Contract Ownership
| Contract | Owner | Change Process |
|---|---|---|
| Gateway->Router | Router team | PR + staging |
| Router->Memory | Memory team | PR + migration |
| Router->Control | Control team | PR + cache TTL |
| NATS events | Platform team | RFC + version |
4. Versioning
Format: {service}:{type}:{hash}:{timestamp}
Example: helion:prompt:abc123:20260119T120000Z
Cache TTL
| Type | TTL | Invalidation |
|---|---|---|
| Prompt | 5 min | NATS event |
| Policy | 1 min | NATS event |
| Config | 30 sec | NATS event |
5. Privacy Modes
| Mode | Restrictions |
|---|---|
| public | None |
| team | No cross-team |
| private | User-only |
| confidential | No logging, no indexing |
6. Trace Correlation
HTTP Headers
- X-Trace-ID, X-Request-ID, X-Job-ID
- X-User-ID, X-Agent-ID, X-Mode
NATS Headers
- Nats-Trace-ID, Nats-Job-ID
- Nats-User-ID, Nats-Agent-ID
7. Acceptance Checklist
- Router stateless: restart doesn't lose jobs
- Idempotency: duplicate job_id = 1 execution
- DLQ works: 3 fails -> DLQ + alert
- Policy enforcement: change applies < cache TTL
- Consumer lag alert: fires on 5min stop
Decision
- All services MUST use trace middleware
- NATS messages MUST have idempotency key
- Control Plane = source of truth
- Memory API = ONLY data access
- DLQ processing automated
- Privacy mode checked in Router