Files
microdao-daarion/docs/ops/experience_bus_phase2.md

72 lines
2.3 KiB
Markdown

# Experience Bus Phase-2 (Lessons Extractor)
## Scope
- Source stream: `EXPERIENCE`
- Source subjects: `agent.experience.v1.>`
- Consumer mode: durable pull + explicit ack
- Output table: `agent_lessons` (append-only)
- Output subject: `agent.lesson.v1` (optional publish)
## Service
- Container: `dagi-experience-learner-node1`
- Endpoint:
- `GET /health`
- `GET /metrics`
## Environment
- `NATS_URL=nats://nats:4222`
- `EXPERIENCE_STREAM_NAME=EXPERIENCE`
- `EXPERIENCE_SUBJECT=agent.experience.v1.>`
- `EXPERIENCE_DURABLE=experience-learner-v1`
- `EXPERIENCE_ACK_WAIT_SECONDS=30`
- `EXPERIENCE_MAX_DELIVER=20`
- `EXPERIENCE_FETCH_BATCH=64`
- `EXPERIENCE_FETCH_TIMEOUT_SECONDS=2`
- `EXPERIENCE_WINDOW_SECONDS=1800`
- `EXPERIENCE_OK_SAMPLE_PCT=10`
- `EXPERIENCE_LATENCY_SPIKE_MS=5000`
- `EXPERIENCE_ERROR_THRESHOLD=3`
- `EXPERIENCE_SILENT_THRESHOLD=5`
- `EXPERIENCE_LATENCY_THRESHOLD=3`
- `EXPERIENCE_EVENT_DEDUP_TTL_SECONDS=3600`
- `LEARNER_DATABASE_URL=postgresql://<user>:<pass>@<host>:5432/daarion_memory`
- `LESSON_SUBJECT=agent.lesson.v1`
- `LESSON_PUBLISH_ENABLED=true`
## Deploy
1. Apply migration `migrations/055_agent_lessons.sql`.
2. Deploy service `experience-learner`.
3. Verify service health and metrics.
## Smoke
```bash
# Generate event traffic (Phase-1 router path)
for i in $(seq 1 50); do
agent=$([ $((i%2)) -eq 0 ] && echo "aistalk" || echo "devtools")
curl -sS -m 8 -o /dev/null \
-X POST "http://127.0.0.1:9102/v1/agents/${agent}/infer" \
-H "content-type: application/json" \
-d "{\"prompt\":\"phase2-smoke-${agent}-${i}-$(date +%s%N)\"}" || true
done
```
## Verify
```bash
# Lessons rows
docker exec dagi-postgres psql -U daarion -d daarion_memory -tAc \
"SELECT count(*) FROM agent_lessons WHERE ts > now()-interval '30 minutes';"
# Idempotency check (run again, duplicates should not explode)
docker exec dagi-postgres psql -U daarion -d daarion_memory -tAc \
"SELECT count(*), count(distinct lesson_key) FROM agent_lessons;"
# Learner metrics
curl -sS http://127.0.0.1:9109/metrics | grep -E 'lessons_|js_messages_'
```
## Acceptance
- `agent_lessons` receives rows under live event flow.
- Reprocessing/redelivery does not duplicate lessons (`lesson_key` unique).
- `js_messages_acked_total` increases.
- `js_messages_redelivered_total` is observable when replay/redelivery occurs.