P2.2+P2.3: NATS offload node-worker + router offload integration

Node Worker (services/node-worker/):
- NATS subscriber for node.{NODE_ID}.llm.request / vision.request
- Canonical JobRequest/JobResponse envelope (Pydantic)
- Idempotency cache (TTL 10min) with inflight dedup
- Deadline enforcement (DEADLINE_EXCEEDED on expired jobs)
- Concurrency limiter (semaphore, returns busy)
- Ollama + Swapper vision providers

Router offload (services/router/offload_client.py):
- NATS req/reply with configurable retries
- Circuit breaker per node+type (3 fails/60s → open 120s)
- Concurrency semaphore for remote requests

Model selection (services/router/model_select.py):
- exclude_nodes parameter for circuit-broken nodes
- force_local flag for fallback re-selection
- Integrated circuit breaker state awareness

Router /infer pipeline:
- Remote offload path when NCS selects remote node
- Automatic fallback: exclude failed node → force_local re-select
- Deadline propagation from router to node-worker

Tests: 17 unit tests (idempotency, deadline, circuit breaker)
Docs: ops/offload_routing.md (subjects, envelope, verification)
Made-with: Cursor
This commit is contained in:
Apple
2026-02-27 02:44:05 -08:00
parent a92c424845
commit c4b94a327d
19 changed files with 1075 additions and 6 deletions

View File

@@ -133,6 +133,30 @@ services:
- dagi-network
restart: unless-stopped
node-worker:
build:
context: ./services/node-worker
dockerfile: Dockerfile
container_name: node-worker-node2
ports:
- "127.0.0.1:8109:8109"
extra_hosts:
- "host.docker.internal:host-gateway"
environment:
- NODE_ID=noda2
- NATS_URL=nats://dagi-nats:4222
- OLLAMA_BASE_URL=http://host.docker.internal:11434
- SWAPPER_URL=http://swapper-service:8890
- NODE_DEFAULT_LLM=qwen3:14b
- NODE_DEFAULT_VISION=llava:13b
- NODE_WORKER_MAX_CONCURRENCY=2
depends_on:
- dagi-nats
- swapper-service
networks:
- dagi-network
restart: unless-stopped
sofiia-console:
build:
context: ./services/sofiia-console