Files
microdao-daarion/docs/ops/experience_bus_phase3.md

2.6 KiB

Experience Bus Phase-3 (Router Runtime Retrieval)

Scope

  • Read path only in router before /v1/agents/{id}/infer.
  • Retrieves lessons from agent_lessons and injects a compact block:
    • Operational Lessons (apply if relevant)
  • Attach policy:
    • after last error / latency spike: always-on, K=7
    • otherwise sampled attach, default 10%, K=3

Environment

  • LESSONS_ATTACH_ENABLED=true
  • LESSONS_DATABASE_URL=postgresql://<user>:<pass>@<host>:5432/daarion_memory
  • LESSONS_ATTACH_MIN=3
  • LESSONS_ATTACH_MAX=7
  • LESSONS_ATTACH_SAMPLE_PCT=10
  • LESSONS_ATTACH_TIMEOUT_MS=25
  • LESSONS_ATTACH_MAX_CHARS=1200
  • LESSONS_SIGNAL_CACHE_TTL_SECONDS=300
  • EXPERIENCE_LATENCY_SPIKE_MS=5000

Metrics

  • lessons_retrieved_total{status="ok|timeout|err"}
  • lessons_attached_total{count="0|1-3|4-7"}
  • lessons_attach_latency_ms

Safety

  • Lessons block never includes raw user text.
  • Guard filters skip lessons containing prompt-injection-like markers:
    • ignore previous, system:, developer:, fenced code blocks.

Smoke

# 1) Seed synthetic lessons for one agent (example: agromatrix)
docker exec dagi-postgres psql -U daarion -d daarion_memory -c "
INSERT INTO agent_lessons (lesson_id, lesson_key, ts, scope, agent_id, task_type, trigger, action, avoid, signals, evidence, raw)
SELECT
  gen_random_uuid(),
  md5(random()::text || clock_timestamp()::text),
  now() - (g * interval '1 minute'),
  'agent',
  'agromatrix',
  'infer',
  'when retrying after model timeout',
  'switch provider or reduce token budget first',
  'avoid repeating the same failed provider with same payload',
  '{"error_class":"TimeoutError","provider":"deepseek","model":"deepseek-chat","profile":"reasoning"}'::jsonb,
  '{"count":3}'::jsonb,
  '{}'::jsonb
FROM generate_series(1,10) g;"

# 2) Send infer calls
for i in $(seq 1 20); do
  curl -sS -m 12 -o /dev/null \
    -X POST "http://127.0.0.1:9102/v1/agents/agromatrix/infer" \
    -H "content-type: application/json" \
    -d "{\"prompt\":\"phase3-smoke-${i}\",\"metadata\":{\"agent_id\":\"agromatrix\"}}" || true
done

# 3) Check metrics
curl -sS http://127.0.0.1:9102/metrics | grep -E 'lessons_retrieved_total|lessons_attached_total|lessons_attach_latency_ms'

# 4) Simulate DB issue (optional): lessons retrieval should fail-open and infer remains 200
# (temporarily point LESSONS_DATABASE_URL to bad DSN + restart router)

Acceptance

  • Router logs include lessons_attached=<k> during sampled or always-on retrieval.
  • Infer path remains healthy when lessons DB is unavailable.
  • p95 infer latency impact stays controlled at sampling 10%.