2.6 KiB
2.6 KiB
Experience Bus Phase-3 (Router Runtime Retrieval)
Scope
- Read path only in
routerbefore/v1/agents/{id}/infer. - Retrieves lessons from
agent_lessonsand injects a compact block:Operational Lessons (apply if relevant)
- Attach policy:
- after last error / latency spike: always-on,
K=7 - otherwise sampled attach, default
10%,K=3
- after last error / latency spike: always-on,
Environment
LESSONS_ATTACH_ENABLED=trueLESSONS_DATABASE_URL=postgresql://<user>:<pass>@<host>:5432/daarion_memoryLESSONS_ATTACH_MIN=3LESSONS_ATTACH_MAX=7LESSONS_ATTACH_SAMPLE_PCT=10LESSONS_ATTACH_TIMEOUT_MS=25LESSONS_ATTACH_MAX_CHARS=1200LESSONS_SIGNAL_CACHE_TTL_SECONDS=300EXPERIENCE_LATENCY_SPIKE_MS=5000
Metrics
lessons_retrieved_total{status="ok|timeout|err"}lessons_attached_total{count="0|1-3|4-7"}lessons_attach_latency_ms
Safety
- Lessons block never includes raw user text.
- Guard filters skip lessons containing prompt-injection-like markers:
ignore previous,system:,developer:, fenced code blocks.
Smoke
# 1) Seed synthetic lessons for one agent (example: agromatrix)
docker exec dagi-postgres psql -U daarion -d daarion_memory -c "
INSERT INTO agent_lessons (lesson_id, lesson_key, ts, scope, agent_id, task_type, trigger, action, avoid, signals, evidence, raw)
SELECT
gen_random_uuid(),
md5(random()::text || clock_timestamp()::text),
now() - (g * interval '1 minute'),
'agent',
'agromatrix',
'infer',
'when retrying after model timeout',
'switch provider or reduce token budget first',
'avoid repeating the same failed provider with same payload',
'{"error_class":"TimeoutError","provider":"deepseek","model":"deepseek-chat","profile":"reasoning"}'::jsonb,
'{"count":3}'::jsonb,
'{}'::jsonb
FROM generate_series(1,10) g;"
# 2) Send infer calls
for i in $(seq 1 20); do
curl -sS -m 12 -o /dev/null \
-X POST "http://127.0.0.1:9102/v1/agents/agromatrix/infer" \
-H "content-type: application/json" \
-d "{\"prompt\":\"phase3-smoke-${i}\",\"metadata\":{\"agent_id\":\"agromatrix\"}}" || true
done
# 3) Check metrics
curl -sS http://127.0.0.1:9102/metrics | grep -E 'lessons_retrieved_total|lessons_attached_total|lessons_attach_latency_ms'
# 4) Simulate DB issue (optional): lessons retrieval should fail-open and infer remains 200
# (temporarily point LESSONS_DATABASE_URL to bad DSN + restart router)
Acceptance
- Router logs include
lessons_attached=<k>during sampled or always-on retrieval. - Infer path remains healthy when lessons DB is unavailable.
- p95 infer latency impact stays controlled at sampling
10%.