Complete snapshot of /opt/microdao-daarion/ from NODE1 (144.76.224.179).
This represents the actual running production code that has diverged
significantly from the previous main branch.
Key changes from old main:
- Gateway (http_api.py): expanded from ~40KB to 164KB with full agent support
- Router: new /v1/agents/{id}/infer endpoint with vision + DeepSeek routing
- Behavior Policy: SOWA v2.2 (3-level: FULL/ACK/SILENT)
- Agent Registry: config/agent_registry.yml as single source of truth
- 13 agents configured (was 3)
- Memory service integration
- CrewAI teams and roles
Excluded from snapshot: venv/, .env, data/, backups, .tgz archives
Co-authored-by: Cursor <cursoragent@cursor.com>
64 lines
1.8 KiB
Markdown
64 lines
1.8 KiB
Markdown
# Runbook: Agent E2E Failure (E2E=0)
|
|
|
|
## Тригери
|
|
- `AgentE2EFailure`: agent_e2e_success{target="gateway_health"} == 0
|
|
- `AgentPingFailure`: agent_e2e_success{target="agent_ping"} == 0
|
|
|
|
## Швидка діагностика (5 команд)
|
|
|
|
```bash
|
|
# 1. Prober status
|
|
curl -sS http://localhost:9108/metrics | grep agent_e2e_success
|
|
|
|
# 2. Gateway logs (останні помилки)
|
|
docker logs dagi-gateway-node1 --tail 20 2>&1 | grep -iE "error|fail|timeout"
|
|
|
|
# 3. Router health
|
|
curl -sS http://localhost:9102/health
|
|
|
|
# 4. NATS connectivity
|
|
docker run --rm --network dagi-network natsio/nats-box nats -s nats://dagi-nats-node1:4222 server ping
|
|
|
|
# 5. Memory-service health
|
|
curl -sS http://localhost:8000/health
|
|
```
|
|
|
|
## Детальна діагностика
|
|
|
|
### Якщо Gateway DOWN
|
|
```bash
|
|
docker ps | grep gateway
|
|
docker logs dagi-gateway-node1 --tail 50
|
|
docker restart dagi-gateway-node1
|
|
```
|
|
|
|
### Якщо Router не відповідає
|
|
```bash
|
|
docker logs dagi-router-node1 --tail 50
|
|
# Перевірити Ollama
|
|
curl -sS http://172.17.0.1:11434/api/tags | head
|
|
```
|
|
|
|
### Якщо Memory-service DOWN
|
|
```bash
|
|
docker logs dagi-memory-service-node1 --tail 50
|
|
# Перевірити Qdrant
|
|
curl -sS http://localhost:6333/collections | head
|
|
```
|
|
|
|
### Якщо NATS проблеми
|
|
```bash
|
|
# JetStream status
|
|
docker run --rm --network dagi-network natsio/nats-box nats -s nats://dagi-nats-node1:4222 stream ls
|
|
docker run --rm --network dagi-network natsio/nats-box nats -s nats://dagi-nats-node1:4222 consumer info ARTIFACT_JOBS render_pdf_worker
|
|
```
|
|
|
|
## Ескалація
|
|
1. Перезапуск сервісу не допоміг → перевірити ресурси (`docker stats`)
|
|
2. OOM kills → `dmesg | grep -i oom`
|
|
3. Disk full → `df -h`
|
|
|
|
## Контакти
|
|
- Slack: #daarion-alerts
|
|
- On-call: check PagerDuty
|