Complete snapshot of /opt/microdao-daarion/ from NODE1 (144.76.224.179).
This represents the actual running production code that has diverged
significantly from the previous main branch.
Key changes from old main:
- Gateway (http_api.py): expanded from ~40KB to 164KB with full agent support
- Router: new /v1/agents/{id}/infer endpoint with vision + DeepSeek routing
- Behavior Policy: SOWA v2.2 (3-level: FULL/ACK/SILENT)
- Agent Registry: config/agent_registry.yml as single source of truth
- 13 agents configured (was 3)
- Memory service integration
- CrewAI teams and roles
Excluded from snapshot: venv/, .env, data/, backups, .tgz archives
Co-authored-by: Cursor <cursoragent@cursor.com>
1.8 KiB
1.8 KiB
Runbook: Agent E2E Failure (E2E=0)
Тригери
AgentE2EFailure: agent_e2e_success{target="gateway_health"} == 0AgentPingFailure: agent_e2e_success{target="agent_ping"} == 0
Швидка діагностика (5 команд)
# 1. Prober status
curl -sS http://localhost:9108/metrics | grep agent_e2e_success
# 2. Gateway logs (останні помилки)
docker logs dagi-gateway-node1 --tail 20 2>&1 | grep -iE "error|fail|timeout"
# 3. Router health
curl -sS http://localhost:9102/health
# 4. NATS connectivity
docker run --rm --network dagi-network natsio/nats-box nats -s nats://dagi-nats-node1:4222 server ping
# 5. Memory-service health
curl -sS http://localhost:8000/health
Детальна діагностика
Якщо Gateway DOWN
docker ps | grep gateway
docker logs dagi-gateway-node1 --tail 50
docker restart dagi-gateway-node1
Якщо Router не відповідає
docker logs dagi-router-node1 --tail 50
# Перевірити Ollama
curl -sS http://172.17.0.1:11434/api/tags | head
Якщо Memory-service DOWN
docker logs dagi-memory-service-node1 --tail 50
# Перевірити Qdrant
curl -sS http://localhost:6333/collections | head
Якщо NATS проблеми
# JetStream status
docker run --rm --network dagi-network natsio/nats-box nats -s nats://dagi-nats-node1:4222 stream ls
docker run --rm --network dagi-network natsio/nats-box nats -s nats://dagi-nats-node1:4222 consumer info ARTIFACT_JOBS render_pdf_worker
Ескалація
- Перезапуск сервісу не допоміг → перевірити ресурси (
docker stats) - OOM kills →
dmesg | grep -i oom - Disk full →
df -h
Контакти
- Slack: #daarion-alerts
- On-call: check PagerDuty