feat(runtime): sync experience bus and learner stack into main

This commit is contained in:
Apple
2026-03-05 11:30:17 -08:00
parent edd0427c61
commit ef6ebe3583
22 changed files with 2837 additions and 22 deletions

View File

@@ -0,0 +1,70 @@
# Experience Bus Phase-3 (Router Runtime Retrieval)
## Scope
- Read path only in `router` before `/v1/agents/{id}/infer`.
- Retrieves lessons from `agent_lessons` and injects a compact block:
- `Operational Lessons (apply if relevant)`
- Attach policy:
- after last error / latency spike: always-on, `K=7`
- otherwise sampled attach, default `10%`, `K=3`
## Environment
- `LESSONS_ATTACH_ENABLED=true`
- `LESSONS_DATABASE_URL=postgresql://<user>:<pass>@<host>:5432/daarion_memory`
- `LESSONS_ATTACH_MIN=3`
- `LESSONS_ATTACH_MAX=7`
- `LESSONS_ATTACH_SAMPLE_PCT=10`
- `LESSONS_ATTACH_TIMEOUT_MS=25`
- `LESSONS_ATTACH_MAX_CHARS=1200`
- `LESSONS_SIGNAL_CACHE_TTL_SECONDS=300`
- `EXPERIENCE_LATENCY_SPIKE_MS=5000`
## Metrics
- `lessons_retrieved_total{status="ok|timeout|err"}`
- `lessons_attached_total{count="0|1-3|4-7"}`
- `lessons_attach_latency_ms`
## Safety
- Lessons block never includes raw user text.
- Guard filters skip lessons containing prompt-injection-like markers:
- `ignore previous`, `system:`, `developer:`, fenced code blocks.
## Smoke
```bash
# 1) Seed synthetic lessons for one agent (example: agromatrix)
docker exec dagi-postgres psql -U daarion -d daarion_memory -c "
INSERT INTO agent_lessons (lesson_id, lesson_key, ts, scope, agent_id, task_type, trigger, action, avoid, signals, evidence, raw)
SELECT
gen_random_uuid(),
md5(random()::text || clock_timestamp()::text),
now() - (g * interval '1 minute'),
'agent',
'agromatrix',
'infer',
'when retrying after model timeout',
'switch provider or reduce token budget first',
'avoid repeating the same failed provider with same payload',
'{"error_class":"TimeoutError","provider":"deepseek","model":"deepseek-chat","profile":"reasoning"}'::jsonb,
'{"count":3}'::jsonb,
'{}'::jsonb
FROM generate_series(1,10) g;"
# 2) Send infer calls
for i in $(seq 1 20); do
curl -sS -m 12 -o /dev/null \
-X POST "http://127.0.0.1:9102/v1/agents/agromatrix/infer" \
-H "content-type: application/json" \
-d "{\"prompt\":\"phase3-smoke-${i}\",\"metadata\":{\"agent_id\":\"agromatrix\"}}" || true
done
# 3) Check metrics
curl -sS http://127.0.0.1:9102/metrics | grep -E 'lessons_retrieved_total|lessons_attached_total|lessons_attach_latency_ms'
# 4) Simulate DB issue (optional): lessons retrieval should fail-open and infer remains 200
# (temporarily point LESSONS_DATABASE_URL to bad DSN + restart router)
```
## Acceptance
- Router logs include `lessons_attached=<k>` during sampled or always-on retrieval.
- Infer path remains healthy when lessons DB is unavailable.
- p95 infer latency impact stays controlled at sampling `10%`.