Files

Apple ef3473db21 snapshot: NODE1 production state 2026-02-09

Complete snapshot of /opt/microdao-daarion/ from NODE1 (144.76.224.179).
This represents the actual running production code that has diverged
significantly from the previous main branch.

Key changes from old main:
- Gateway (http_api.py): expanded from ~40KB to 164KB with full agent support
- Router: new /v1/agents/{id}/infer endpoint with vision + DeepSeek routing
- Behavior Policy: SOWA v2.2 (3-level: FULL/ACK/SILENT)
- Agent Registry: config/agent_registry.yml as single source of truth
- 13 agents configured (was 3)
- Memory service integration
- CrewAI teams and roles

Excluded from snapshot: venv/, .env, data/, backups, .tgz archives

Co-authored-by: Cursor <cursoragent@cursor.com>

2026-02-09 08:46:46 -08:00

4.9 KiB

Raw Blame History

DAARION Multi-Node Architecture

Current State: Single Node (NODE1)

NODE1 (144.76.224.179) - Hetzner GEX44
├── RTX 4000 SFF Ada (20GB VRAM)
├── Services:
│   ├── Gateway :9300
│   ├── Router :9102
│   ├── Swapper :8890 (GPU)
│   ├── Memory Service :8000
│   ├── CrewAI :9010
│   ├── CrewAI Worker :9011
│   ├── Ingest :8100
│   ├── Parser :8101
│   ├── Prometheus :9090
│   └── Grafana :3030
├── Data:
│   ├── PostgreSQL :5432
│   ├── Qdrant :6333
│   ├── Neo4j :7687
│   └── Redis :6379
└── Messaging:
    └── NATS JetStream :4222

Target: Multi-Node Topology

Edge Router Pattern

                    ┌─────────────────────┐
                    │   Global Entry      │
                    │ gateway.daarion.city│
                    │    (CloudFlare)     │
                    └──────────┬──────────┘
                               │
              ┌────────────────┼────────────────┐
              │                │                │
     ┌────────▼────────┐ ┌────▼────────┐ ┌────▼────────┐
     │     NODE1       │ │    NODE2    │ │   NODE3     │
     │   (Primary)     │ │  (Replica)  │ │   (Edge)    │
     │  Hetzner GEX44  │ │  Hetzner    │ │  Hetzner    │
     └────────┬────────┘ └─────┬───────┘ └─────┬───────┘
              │                │               │
     ┌────────▼────────────────▼───────────────▼────────┐
     │              NATS Supercluster                   │
     │         (Leafnodes / Mirrored Streams)           │
     └──────────────────────────────────────────────────┘

Node Roles

NODE1 (Primary)

GPU workloads (Swapper, Vision, FLUX)
Primary data stores
CrewAI orchestration

NODE2 (Replica)

Read replicas (Qdrant, Neo4j)
Backup Gateway/Router
Async workers

NODE3+ (Edge)

Regional edge routing
Local cache (Redis)
NATS leafnode

Data Replication Strategy

PostgreSQL:
  mode: primary-replica
  primary: NODE1
  replicas: [NODE2]
  sync: async (streaming)

Qdrant:
  mode: sharded
  shards: 2
  replication_factor: 2
  nodes: [NODE1, NODE2]

Neo4j:
  mode: causal-cluster
  core_servers: [NODE1, NODE2]
  read_replicas: [NODE3]

Redis:
  mode: sentinel
  master: NODE1
  replicas: [NODE2]
  sentinels: 3

NATS:
  mode: supercluster
  clusters:
    - name: core
      nodes: [NODE1, NODE2]
    - name: edge
      nodes: [NODE3]
      leafnode_to: core

Service Distribution

Service	NODE1	NODE2	NODE3
Gateway	✓	✓	✓
Router	✓	✓ (standby)	-
Swapper (GPU)	✓	-	-
Memory Service	✓	✓ (read)	-
PostgreSQL	✓ (primary)	✓ (replica)	-
Qdrant	✓ (shard1)	✓ (shard2)	-
Neo4j	✓ (core)	✓ (core)	✓ (read)
Redis	✓ (master)	✓ (replica)	✓ (cache)
NATS	✓ (cluster)	✓ (cluster)	✓ (leaf)

NATS Subject Routing

# Core subjects (replicated across all nodes)
message.*
attachment.*
agent.run.*

# Node-specific subjects
node.{node_id}.local.*

# Edge subjects (local only)
cache.invalidate.*

Implementation Phases

Phase 3.1: Prepare NODE1 for replication

Enable PostgreSQL streaming replication
Configure Qdrant for clustering
Set up NATS cluster mode

Phase 3.2: Deploy NODE2

Provision Hetzner server
Deploy base stack
Configure replicas
Test failover

Phase 3.3: Add Edge Nodes

Deploy lightweight edge stack
Configure NATS leafnodes
Set up geo-routing

Environment Variables for Multi-Node

# NODE1 specific
NODE_ID=node1
NODE_ROLE=primary
CLUSTER_PEERS=node2:4222,node3:4222

# Replication
PG_REPLICATION_USER=replicator
PG_REPLICATION_PASSWORD=<secure>
QDRANT_CLUSTER_ENABLED=true
NATS_CLUSTER_NAME=daarion-core

Health Check Endpoints

Each node exposes:

/health - basic health
/ready - ready for traffic
/cluster/status - cluster membership
/cluster/peers - peer connectivity

Failover Scenarios

NODE1 down: NODE2 promotes to primary
Network partition: Split-brain prevention via NATS
GPU failure: Fallback to API models

Next Steps

Prepare NODE1 for replication configs
Document NODE2 provisioning
Create deployment scripts

4.9 KiB Raw Blame History

DAARION Multi-Node Architecture

Current State: Single Node (NODE1)

Target: Multi-Node Topology

Edge Router Pattern

Node Roles

NODE1 (Primary)

NODE2 (Replica)

NODE3+ (Edge)

Data Replication Strategy

Service Distribution

NATS Subject Routing

Implementation Phases

Phase 3.1: Prepare NODE1 for replication

Phase 3.2: Deploy NODE2

Phase 3.3: Add Edge Nodes

Environment Variables for Multi-Node

Health Check Endpoints

Failover Scenarios

Next Steps

4.9 KiB

Raw Blame History