feat(matrix-bridge-dagi): harden mixed rooms with safe defaults and ops visibility (M2.2)

Guard rails (mixed_routing.py):
  - MAX_AGENTS_PER_MIXED_ROOM (default 5): fail-fast at parse time
  - MAX_SLASH_LEN (default 32): reject garbage/injection slash tokens
  - Unified rejection reasons: unknown_agent, slash_too_long, no_mapping
  - REASON_REJECTED_* constants (separate from success REASON_*)

Ingress (ingress.py):
  - per-room-agent concurrency semaphore (MIXED_CONCURRENCY_CAP, default 1)
  - active_lock_count property for /health + prometheus
  - UNKNOWN_AGENT_BEHAVIOR: "ignore" (silent) | "reply_error" (inform user)
  - on_routed(agent_id, reason) callback for routing metrics
  - on_route_rejected(room_id, reason) callback for rejection metrics
  - matrix.route.rejected audit event on every rejection

Config + main:
  - max_agents_per_mixed_room, max_slash_len, unknown_agent_behavior, mixed_concurrency_cap
  - matrix_bridge_routed_total{agent_id, reason} counter
  - matrix_bridge_route_rejected_total{room_id, reason} counter
  - matrix_bridge_active_room_agent_locks gauge
  - /health: mixed_guard_rails section + total_agents_in_mixed_rooms
  - docker-compose: all 4 new guard rail env vars

Runbook: section 9 — mixed room debug guide (6 acceptance tests, routing metrics, session isolation, lock hang, config guard)

Tests: 108 pass (94 → 108, +14 new tests for guard rails + callbacks + concurrency)
Made-with: Cursor
This commit is contained in:
Apple
2026-03-05 01:41:20 -08:00
parent a85a11984b
commit d40b1e87c6
8 changed files with 576 additions and 21 deletions

View File

@@ -60,6 +60,16 @@ services:
# "!roomX:server=helion;!roomY:server=druid"
- BRIDGE_MIXED_DEFAULTS=${BRIDGE_MIXED_DEFAULTS:-}
# ── M2.2: Mixed room guard rails ────────────────────────────────────
# Fail-fast if any room defines more agents than this
- MAX_AGENTS_PER_MIXED_ROOM=${MAX_AGENTS_PER_MIXED_ROOM:-5}
# Reject slash commands longer than this (anti-garbage / injection guard)
- MAX_SLASH_LEN=${MAX_SLASH_LEN:-32}
# What to do when unknown /slash is used: "ignore" (silent) | "reply_error" (inform user)
- UNKNOWN_AGENT_BEHAVIOR=${UNKNOWN_AGENT_BEHAVIOR:-ignore}
# Max concurrent Router invocations per (room, agent) pair; 0 = unlimited
- MIXED_CONCURRENCY_CAP=${MIXED_CONCURRENCY_CAP:-1}
healthcheck:
test:
- "CMD"