Complete snapshot of /opt/microdao-daarion/ from NODE1 (144.76.224.179).
This represents the actual running production code that has diverged
significantly from the previous main branch.
Key changes from old main:
- Gateway (http_api.py): expanded from ~40KB to 164KB with full agent support
- Router: new /v1/agents/{id}/infer endpoint with vision + DeepSeek routing
- Behavior Policy: SOWA v2.2 (3-level: FULL/ACK/SILENT)
- Agent Registry: config/agent_registry.yml as single source of truth
- 13 agents configured (was 3)
- Memory service integration
- CrewAI teams and roles
Excluded from snapshot: venv/, .env, data/, backups, .tgz archives
Co-authored-by: Cursor <cursoragent@cursor.com>
4.9 KiB
4.9 KiB
DAARION Multi-Node Architecture
Current State: Single Node (NODE1)
NODE1 (144.76.224.179) - Hetzner GEX44
├── RTX 4000 SFF Ada (20GB VRAM)
├── Services:
│ ├── Gateway :9300
│ ├── Router :9102
│ ├── Swapper :8890 (GPU)
│ ├── Memory Service :8000
│ ├── CrewAI :9010
│ ├── CrewAI Worker :9011
│ ├── Ingest :8100
│ ├── Parser :8101
│ ├── Prometheus :9090
│ └── Grafana :3030
├── Data:
│ ├── PostgreSQL :5432
│ ├── Qdrant :6333
│ ├── Neo4j :7687
│ └── Redis :6379
└── Messaging:
└── NATS JetStream :4222
Target: Multi-Node Topology
Edge Router Pattern
┌─────────────────────┐
│ Global Entry │
│ gateway.daarion.city│
│ (CloudFlare) │
└──────────┬──────────┘
│
┌────────────────┼────────────────┐
│ │ │
┌────────▼────────┐ ┌────▼────────┐ ┌────▼────────┐
│ NODE1 │ │ NODE2 │ │ NODE3 │
│ (Primary) │ │ (Replica) │ │ (Edge) │
│ Hetzner GEX44 │ │ Hetzner │ │ Hetzner │
└────────┬────────┘ └─────┬───────┘ └─────┬───────┘
│ │ │
┌────────▼────────────────▼───────────────▼────────┐
│ NATS Supercluster │
│ (Leafnodes / Mirrored Streams) │
└──────────────────────────────────────────────────┘
Node Roles
NODE1 (Primary)
- GPU workloads (Swapper, Vision, FLUX)
- Primary data stores
- CrewAI orchestration
NODE2 (Replica)
- Read replicas (Qdrant, Neo4j)
- Backup Gateway/Router
- Async workers
NODE3+ (Edge)
- Regional edge routing
- Local cache (Redis)
- NATS leafnode
Data Replication Strategy
PostgreSQL:
mode: primary-replica
primary: NODE1
replicas: [NODE2]
sync: async (streaming)
Qdrant:
mode: sharded
shards: 2
replication_factor: 2
nodes: [NODE1, NODE2]
Neo4j:
mode: causal-cluster
core_servers: [NODE1, NODE2]
read_replicas: [NODE3]
Redis:
mode: sentinel
master: NODE1
replicas: [NODE2]
sentinels: 3
NATS:
mode: supercluster
clusters:
- name: core
nodes: [NODE1, NODE2]
- name: edge
nodes: [NODE3]
leafnode_to: core
Service Distribution
| Service | NODE1 | NODE2 | NODE3 |
|---|---|---|---|
| Gateway | ✓ | ✓ | ✓ |
| Router | ✓ | ✓ (standby) | - |
| Swapper (GPU) | ✓ | - | - |
| Memory Service | ✓ | ✓ (read) | - |
| PostgreSQL | ✓ (primary) | ✓ (replica) | - |
| Qdrant | ✓ (shard1) | ✓ (shard2) | - |
| Neo4j | ✓ (core) | ✓ (core) | ✓ (read) |
| Redis | ✓ (master) | ✓ (replica) | ✓ (cache) |
| NATS | ✓ (cluster) | ✓ (cluster) | ✓ (leaf) |
NATS Subject Routing
# Core subjects (replicated across all nodes)
message.*
attachment.*
agent.run.*
# Node-specific subjects
node.{node_id}.local.*
# Edge subjects (local only)
cache.invalidate.*
Implementation Phases
Phase 3.1: Prepare NODE1 for replication
- Enable PostgreSQL streaming replication
- Configure Qdrant for clustering
- Set up NATS cluster mode
Phase 3.2: Deploy NODE2
- Provision Hetzner server
- Deploy base stack
- Configure replicas
- Test failover
Phase 3.3: Add Edge Nodes
- Deploy lightweight edge stack
- Configure NATS leafnodes
- Set up geo-routing
Environment Variables for Multi-Node
# NODE1 specific
NODE_ID=node1
NODE_ROLE=primary
CLUSTER_PEERS=node2:4222,node3:4222
# Replication
PG_REPLICATION_USER=replicator
PG_REPLICATION_PASSWORD=<secure>
QDRANT_CLUSTER_ENABLED=true
NATS_CLUSTER_NAME=daarion-core
Health Check Endpoints
Each node exposes:
/health- basic health/ready- ready for traffic/cluster/status- cluster membership/cluster/peers- peer connectivity
Failover Scenarios
- NODE1 down: NODE2 promotes to primary
- Network partition: Split-brain prevention via NATS
- GPU failure: Fallback to API models
Next Steps
- Prepare NODE1 for replication configs
- Document NODE2 provisioning
- Create deployment scripts