feat(matrix-bridge-dagi): add backpressure queue with N workers (H2)
Reader + N workers architecture:
Reader: sync_poll → rate_check → dedupe → queue.put_nowait()
Workers (WORKER_CONCURRENCY, default 2): queue.get() → invoke → send → audit
Drop policy (queue full):
- put_nowait() raises QueueFull → dropped immediately (reader never blocks)
- audit matrix.queue_full + on_queue_dropped callback
- metric: matrix_bridge_queue_dropped_total{room_id,agent_id}
Graceful shutdown:
1. stop_event → reader exits loop
2. queue.join() with QUEUE_DRAIN_TIMEOUT_S (default 5s) → workers finish in-flight
3. worker tasks cancelled
New config env vars:
QUEUE_MAX_EVENTS (default 100)
WORKER_CONCURRENCY (default 2)
QUEUE_DRAIN_TIMEOUT_S (default 5)
New metrics (H3 additions):
matrix_bridge_queue_size (gauge)
matrix_bridge_queue_dropped_total (counter)
matrix_bridge_queue_wait_seconds histogram (buckets: 0.01…30s)
/health: queue.size, queue.max, queue.workers
MatrixIngressLoop: queue_size + worker_count properties
6 queue tests: enqueue/process, full-drop-audit, concurrency barrier,
graceful drain, wait metric, rate-limit-before-enqueue
Total: 71 passed
Made-with: Cursor
This commit is contained in:
@@ -29,6 +29,11 @@ class BridgeConfig:
|
||||
rate_limit_room_rpm: int # max messages per room per minute
|
||||
rate_limit_sender_rpm: int # max messages per sender per minute
|
||||
|
||||
# H2: Backpressure queue
|
||||
queue_max_events: int # max pending items (drops oldest on full)
|
||||
worker_concurrency: int # parallel invoke workers
|
||||
queue_drain_timeout_s: float # graceful shutdown drain timeout
|
||||
|
||||
# Service identity
|
||||
node_id: str
|
||||
build_sha: str
|
||||
@@ -62,6 +67,9 @@ def load_config() -> BridgeConfig:
|
||||
bridge_allowed_agents=allowed,
|
||||
rate_limit_room_rpm=int(_optional("RATE_LIMIT_ROOM_RPM", "20")),
|
||||
rate_limit_sender_rpm=int(_optional("RATE_LIMIT_SENDER_RPM", "10")),
|
||||
queue_max_events=max(1, int(_optional("QUEUE_MAX_EVENTS", "100"))),
|
||||
worker_concurrency=max(1, int(_optional("WORKER_CONCURRENCY", "2"))),
|
||||
queue_drain_timeout_s=max(1.0, float(_optional("QUEUE_DRAIN_TIMEOUT_S", "5"))),
|
||||
node_id=_optional("NODE_ID", "NODA1"),
|
||||
build_sha=_optional("BUILD_SHA", "dev"),
|
||||
build_time=_optional("BUILD_TIME", "local"),
|
||||
|
||||
Reference in New Issue
Block a user