Commit Graph

7 Commits

Author SHA1 Message Date
Apple
fe6e3d30ae feat(matrix-bridge-dagi): add operator allowlist for control commands (M3.0)
New: app/control.py
  - ControlConfig: operator_allowlist + control_rooms (frozensets)
  - parse_control_config(): validates @user:server + !room:server formats, fail-fast
  - parse_command(): parses !verb subcommand [args] [key=value] up to 512 chars
  - check_authorization(): AND(is_control_room, is_operator) → (bool, reason)
  - Reply helpers: not_implemented, unknown_command, unauthorized, help
  - KNOWN_VERBS: runbook, status, help (M3.1+ stubs)
  - MAX_CMD_LEN=512, MAX_CMD_TOKENS=20

ingress.py:
  - _try_control(): dispatch for control rooms (authorized → audit + reply, unauthorized → audit + optional )
  - join control rooms on startup
  - _enqueue_from_sync: control rooms processed first, never forwarded to agents
  - on_control_command(sender, verb, subcommand) metric callback
  - CONTROL_UNAUTHORIZED_BEHAVIOR: "ignore" | "reply_error"

Audit events:
  matrix.control.command       — authorised command (verb, subcommand, args, kwargs)
  matrix.control.unauthorized  — rejected by allowlist (reason: not_operator | not_control_room)
  matrix.control.unknown_cmd   — authorised but unrecognised verb

Config + main:
  - bridge_operator_allowlist, bridge_control_rooms, control_unauthorized_behavior
  - matrix_bridge_control_commands_total{sender,verb,subcommand} counter
  - /health: control_channel section (enabled, rooms_count, operators_count, behavior)
  - /bridge/mappings: control_rooms + control_operators_count
  - docker-compose: BRIDGE_OPERATOR_ALLOWLIST, BRIDGE_CONTROL_ROOMS, CONTROL_UNAUTHORIZED_BEHAVIOR

Tests: 40 new → 148 total pass
Made-with: Cursor
2026-03-05 01:50:04 -08:00
Apple
d40b1e87c6 feat(matrix-bridge-dagi): harden mixed rooms with safe defaults and ops visibility (M2.2)
Guard rails (mixed_routing.py):
  - MAX_AGENTS_PER_MIXED_ROOM (default 5): fail-fast at parse time
  - MAX_SLASH_LEN (default 32): reject garbage/injection slash tokens
  - Unified rejection reasons: unknown_agent, slash_too_long, no_mapping
  - REASON_REJECTED_* constants (separate from success REASON_*)

Ingress (ingress.py):
  - per-room-agent concurrency semaphore (MIXED_CONCURRENCY_CAP, default 1)
  - active_lock_count property for /health + prometheus
  - UNKNOWN_AGENT_BEHAVIOR: "ignore" (silent) | "reply_error" (inform user)
  - on_routed(agent_id, reason) callback for routing metrics
  - on_route_rejected(room_id, reason) callback for rejection metrics
  - matrix.route.rejected audit event on every rejection

Config + main:
  - max_agents_per_mixed_room, max_slash_len, unknown_agent_behavior, mixed_concurrency_cap
  - matrix_bridge_routed_total{agent_id, reason} counter
  - matrix_bridge_route_rejected_total{room_id, reason} counter
  - matrix_bridge_active_room_agent_locks gauge
  - /health: mixed_guard_rails section + total_agents_in_mixed_rooms
  - docker-compose: all 4 new guard rail env vars

Runbook: section 9 — mixed room debug guide (6 acceptance tests, routing metrics, session isolation, lock hang, config guard)

Tests: 108 pass (94 → 108, +14 new tests for guard rails + callbacks + concurrency)
Made-with: Cursor
2026-03-05 01:41:20 -08:00
Apple
a85a11984b feat(matrix-bridge-dagi): add mixed-room routing by slash/mention (M2.1)
- mixed_routing.py: parse BRIDGE_MIXED_ROOM_MAP, route by /slash > @mention > name: > default
- ingress.py: _try_enqueue_mixed for mixed rooms, session isolation {room}:{agent}, reply tagging
- config.py: bridge_mixed_room_map + bridge_mixed_defaults fields
- main.py: parse mixed config, pass to MatrixIngressLoop, expose in /health + /bridge/mappings
- docker-compose: BRIDGE_MIXED_ROOM_MAP / BRIDGE_MIXED_DEFAULTS env vars, BRIDGE_ALLOWED_AGENTS multi-value
- tests: 25 routing unit tests + 10 ingress integration tests (94 total pass)

Made-with: Cursor
2026-03-05 01:29:18 -08:00
Apple
a24dae8e18 feat(matrix-bridge-dagi): add backpressure queue with N workers (H2)
Reader + N workers architecture:
  Reader: sync_poll → rate_check → dedupe → queue.put_nowait()
  Workers (WORKER_CONCURRENCY, default 2): queue.get() → invoke → send → audit

Drop policy (queue full):
  - put_nowait() raises QueueFull → dropped immediately (reader never blocks)
  - audit matrix.queue_full + on_queue_dropped callback
  - metric: matrix_bridge_queue_dropped_total{room_id,agent_id}

Graceful shutdown:
  1. stop_event → reader exits loop
  2. queue.join() with QUEUE_DRAIN_TIMEOUT_S (default 5s) → workers finish in-flight
  3. worker tasks cancelled

New config env vars:
  QUEUE_MAX_EVENTS (default 100)
  WORKER_CONCURRENCY (default 2)
  QUEUE_DRAIN_TIMEOUT_S (default 5)

New metrics (H3 additions):
  matrix_bridge_queue_size (gauge)
  matrix_bridge_queue_dropped_total (counter)
  matrix_bridge_queue_wait_seconds histogram (buckets: 0.01…30s)

/health: queue.size, queue.max, queue.workers
MatrixIngressLoop: queue_size + worker_count properties

6 queue tests: enqueue/process, full-drop-audit, concurrency barrier,
graceful drain, wait metric, rate-limit-before-enqueue
Total: 71 passed

Made-with: Cursor
2026-03-05 01:07:04 -08:00
Apple
a4e95482bc feat(matrix-bridge-dagi): add rate limiting (H1) and metrics (H3)
H1 — InMemoryRateLimiter (sliding window, no Redis):
  - Per-room: RATE_LIMIT_ROOM_RPM (default 20/min)
  - Per-sender: RATE_LIMIT_SENDER_RPM (default 10/min)
  - Room checked before sender — sender quota not charged on room block
  - Blocked messages: audit matrix.rate_limited + on_rate_limited callback
  - reset() for ops/test, stats() exposed in /health

H3 — Extended Prometheus metrics:
  - matrix_bridge_rate_limited_total{room_id,agent_id,limit_type}
  - matrix_bridge_send_duration_seconds histogram (invoke was already there)
  - matrix_bridge_invoke_duration_seconds buckets tuned for LLM latency
  - matrix_bridge_rate_limiter_active_rooms/senders gauges
  - on_invoke_latency + on_send_latency callbacks wired in ingress loop

16 new tests: rate limiter unit (13) + ingress integration (3)
Total: 65 passed

Made-with: Cursor
2026-03-05 00:54:14 -08:00
Apple
cad3663508 feat(matrix-bridge-dagi): add egress, audit integration, fix router endpoint (PR-M1.4)
Closes the full Matrix ↔ DAGI loop:

Egress:
- invoke Router POST /v1/agents/{agent_id}/infer (field: prompt, response: response)
- send_text() reply to Matrix room with idempotent txn_id = make_txn_id(room_id, event_id)
- empty reply → skip send (no spam)
- reply truncated to 4000 chars if needed

Audit (via sofiia-console POST /api/audit/internal):
- matrix.message.received (on ingress)
- matrix.agent.replied (on successful reply)
- matrix.error (on router/send failure, with error_code)
- fire-and-forget: audit failures never crash the loop

Router URL fix:
- DAGI_GATEWAY_URL now points to dagi-router-node1:8000 (not gateway:9300)
- Session ID: stable per room — matrix:{room_localpart} (memory context)

9 tests: invoke endpoint, fallback fields, audit write, full cycle,
dedupe, empty reply skip, metric callbacks

Made-with: Cursor
2026-03-03 08:06:49 -08:00
Apple
dbfab78f02 feat(matrix-bridge-dagi): add room mapping, ingress loop, synapse setup (PR-M1.2 + PR-M1.3)
PR-M1.2 — room-to-agent mapping:
- adds room_mapping.py: parse BRIDGE_ROOM_MAP (format: agent:!room_id:server)
- RoomMappingConfig with O(1) room→agent lookup, agent allowlist check
- /bridge/mappings endpoint (read-only ops summary, no secrets)
- health endpoint now includes mappings_count
- 21 tests for parsing, validation, allowlist, summary

PR-M1.3 — Matrix ingress loop:
- adds ingress.py: MatrixIngressLoop asyncio task
- sync_poll → extract → dedupe → _invoke_gateway (POST /v1/invoke)
- gateway payload: agent_id, node_id, message, metadata (transport, room_id, event_id, sender)
- exponential backoff on errors (2s..60s)
- joins all mapped rooms at startup
- metric callbacks: on_message_received, on_gateway_error
- graceful shutdown via asyncio.Event
- 5 ingress tests (invoke, dedupe, callbacks, empty-map idle)

Synapse setup (docker-compose.synapse-node1.yml):
- fixed volume: bind mount ./synapse-data instead of named volume
- added port mapping 127.0.0.1:8008:8008

Synapse running on NODA1 (localhost:8008), bot @dagi_bridge:daarion.space created,
room !QwHczWXgefDHBEVkTH:daarion.space created, all 4 values in .env on NODA1.

Made-with: Cursor
2026-03-03 07:51:13 -08:00