P3.2+P3.3+P3.4: NODA1 node-worker + NATS auth config + Prometheus counters

P3.2 — Multi-node deployment:
- Added node-worker service to docker-compose.node1.yml (NODE_ID=noda1)
- NCS NODA1 now has NODE_WORKER_URL for metrics collection
- Fixed NODE_ID consistency: router NODA1 uses 'noda1'
- NODA2 node-worker/NCS gets NCS_REPORT_URL for latency reporting

P3.3 — NATS accounts/auth (opt-in config):
- config/nats-server.conf with 3 accounts: SYS, FABRIC, APP
- Per-user topic permissions (router, ncs, node_worker)
- Leafnode listener :7422 with auth
- Not yet activated (requires credential provisioning)

P3.4 — Prometheus counters:
- Router /fabric_metrics: caps_refresh, caps_stale, model_select,
  offload_total, breaker_state, score_ms histogram
- Node Worker /prom_metrics: jobs_total, inflight gauge, latency_ms histogram
- NCS /prom_metrics: runtime_health, runtime_p50/p95, node_wait_ms
- All bound to 127.0.0.1 (not externally exposed)

Made-with: Cursor
This commit is contained in:
Apple
2026-02-27 03:03:18 -08:00
parent a605b8c43e
commit ed7ad49d3a
13 changed files with 408 additions and 1 deletions

View File

@@ -13,7 +13,7 @@ services:
- NATS_URL=nats://nats:4222
- ROUTER_CONFIG_PATH=/app/router_config.yaml
- LOG_LEVEL=info
- NODE_ID=node-1-hetzner-gex44
- NODE_ID=noda1
- MEMORY_SERVICE_URL=http://memory-service:8000
# Timeout policy: Gateway (180s) > Router (60s) > LLM (30s)
- ROUTER_TIMEOUT=180
@@ -503,6 +503,7 @@ services:
- CACHE_TTL_SEC=15
- ENABLE_NATS_CAPS=true
- NATS_URL=nats://nats:4222
- NODE_WORKER_URL=http://node-worker:8109
extra_hosts:
- "host.docker.internal:host-gateway"
depends_on:
@@ -513,6 +514,32 @@ services:
- node-capabilities
restart: unless-stopped
# Node Worker — NATS offload executor
node-worker:
build:
context: ./services/node-worker
dockerfile: Dockerfile
container_name: node-worker-node1
ports:
- "127.0.0.1:8109:8109"
extra_hosts:
- "host.docker.internal:host-gateway"
environment:
- NODE_ID=noda1
- NATS_URL=nats://nats:4222
- OLLAMA_BASE_URL=http://host.docker.internal:11434
- SWAPPER_URL=http://swapper-service:8890
- NODE_DEFAULT_LLM=qwen3.5:27b
- NODE_DEFAULT_VISION=qwen3-vl-8b
- NODE_WORKER_MAX_CONCURRENCY=2
- NCS_REPORT_URL=http://node-capabilities:8099
depends_on:
- nats
- swapper-service
networks:
- dagi-network
restart: unless-stopped
# NATS (JetStream)
nats:
image: nats:2.10-alpine