docs(platform): add policy configs, runbooks, ops scripts and platform documentation
Config policies (16 files): alert_routing, architecture_pressure, backlog, cost_weights, data_governance, incident_escalation, incident_intelligence, network_allowlist, nodes_registry, observability_sources, rbac_tools_matrix, release_gate, risk_attribution, risk_policy, slo_policy, tool_limits, tools_rollout Ops (22 files): Caddyfile, calendar compose, grafana voice dashboard, deployments/incidents logs, runbooks for alerts/audit/backlog/incidents/sofiia/voice, cron jobs, scripts (alert_triage, audit_cleanup, migrate_*, governance, schedule), task_registry, voice alerts/ha/latency/policy Docs (30+ files): HUMANIZED_STEPAN v2.7-v3 changelogs and runbooks, NODA1/NODA2 status and setup, audit index and traces, backlog, incident, supervisor, tools, voice, opencode, release, risk, aistalk, spacebot Made-with: Cursor
This commit is contained in:
212
docs/backlog/backlog.md
Normal file
212
docs/backlog/backlog.md
Normal file
@@ -0,0 +1,212 @@
|
||||
# Engineering Backlog Bridge — DAARION.city
|
||||
|
||||
## Overview
|
||||
|
||||
The **Engineering Backlog Bridge** converts Risk/Pressure digest signals into a
|
||||
**managed, structured backlog** of engineering work items. It closes the loop:
|
||||
|
||||
```
|
||||
observe (Risk/Pressure) → decide (digest) → plan (backlog) → enforce (gates)
|
||||
```
|
||||
|
||||
No LLM. Fully deterministic. Policy-driven. Idempotent (weekly dedupe).
|
||||
|
||||
---
|
||||
|
||||
## Data Model
|
||||
|
||||
### BacklogItem
|
||||
|
||||
| Field | Type | Description |
|
||||
|----------------|----------|-------------|
|
||||
| `id` | string | `bl_<hex12>` |
|
||||
| `created_at` | ISO ts | When created |
|
||||
| `updated_at` | ISO ts | Last modification |
|
||||
| `env` | string | `prod` / `staging` / `dev` |
|
||||
| `service` | string | DAARION service name |
|
||||
| `category` | enum | `arch_review`, `refactor`, `slo_hardening`, `cleanup_followups`, `security` |
|
||||
| `title` | string | Short human-readable label |
|
||||
| `description` | string | Bullet-list of signals + context |
|
||||
| `priority` | enum | `P0` .. `P3` |
|
||||
| `status` | enum | See Workflow below |
|
||||
| `owner` | string | `oncall` / `cto` / team name |
|
||||
| `due_date` | YYYY-MM-DD | Computed from category `due_days` |
|
||||
| `source` | string | `risk` / `pressure` / `digest` / `manual` |
|
||||
| `dedupe_key` | string | `platform_backlog:{YYYY-WW}:{env}:{service}:{category}` |
|
||||
| `evidence_refs`| dict | `alerts[]`, `incidents[]`, `release_checks[]`, `artifacts[]`, `followups[]` |
|
||||
| `tags` | list | `["auto", "week:2026-W08", "rule:arch_review_required"]` |
|
||||
| `meta` | dict | Free-form metadata |
|
||||
|
||||
### BacklogEvent (timeline)
|
||||
|
||||
| Field | Type | Description |
|
||||
|------------|--------|-------------|
|
||||
| `id` | string | `ev_<hex12>` |
|
||||
| `item_id` | string | FK to BacklogItem |
|
||||
| `ts` | ISO ts | Event timestamp |
|
||||
| `type` | enum | `created`, `status_change`, `comment`, `auto_update` |
|
||||
| `message` | string | Human-readable description |
|
||||
| `actor` | string | Who triggered the event |
|
||||
| `meta` | dict | Old/new status, rule name, etc. |
|
||||
|
||||
---
|
||||
|
||||
## Workflow
|
||||
|
||||
```
|
||||
open ──► in_progress ──► done
|
||||
│ │
|
||||
│ ▼
|
||||
└──► blocked ──► in_progress
|
||||
│
|
||||
└──► canceled (terminal)
|
||||
```
|
||||
|
||||
| From | Allowed targets |
|
||||
|--------------|-------------------------------|
|
||||
| `open` | in_progress, blocked, canceled |
|
||||
| `in_progress`| blocked, done, canceled |
|
||||
| `blocked` | open, in_progress, canceled |
|
||||
| `done` | (none — terminal) |
|
||||
| `canceled` | (none — terminal) |
|
||||
|
||||
Transitions are enforced by `validate_transition()` in `backlog_store.py`.
|
||||
|
||||
---
|
||||
|
||||
## Auto-generation Rules
|
||||
|
||||
Rules are evaluated **per-service** from `config/backlog_policy.yml`.
|
||||
All conditions in `when` must hold (AND logic). First matching rule per
|
||||
category wins (no duplicate categories per service per week).
|
||||
|
||||
| Rule name | Trigger condition | Category | Priority |
|
||||
|-------------------------|---------------------------------------------|--------------------|----------|
|
||||
| `arch_review_required` | `pressure_requires_arch_review: true` | `arch_review` | P1 / 14d |
|
||||
| `high_pressure_refactor`| `pressure_band` AND `risk_band` ∈ high/critical | `refactor` | P1 / 21d |
|
||||
| `slo_violations` | `risk_has_slo_violations: true` | `slo_hardening` | P2 / 30d |
|
||||
| `followup_backlog` | `followups_overdue > 0` | `cleanup_followups`| P2 / 14d |
|
||||
|
||||
---
|
||||
|
||||
## Dedupe Logic
|
||||
|
||||
Each item has a `dedupe_key`:
|
||||
|
||||
```
|
||||
platform_backlog:{YYYY-WW}:{env}:{service}:{category}
|
||||
```
|
||||
|
||||
`upsert()` uses this key:
|
||||
- **First run of week** → creates the item.
|
||||
- **Subsequent runs** → updates title/description/evidence_refs (preserves status/owner).
|
||||
|
||||
This means weekly re-generation is safe and idempotent.
|
||||
|
||||
---
|
||||
|
||||
## API
|
||||
|
||||
### HTTP Endpoints
|
||||
|
||||
| Method | Path | RBAC | Description |
|
||||
|--------|-------------------------------------|------------------------|-------------|
|
||||
| GET | `/v1/backlog/dashboard?env=prod` | `tools.backlog.read` | Status/priority/overdue summary |
|
||||
| GET | `/v1/backlog/items` | `tools.backlog.read` | Filtered item list |
|
||||
| GET | `/v1/backlog/items/{id}` | `tools.backlog.read` | Single item + event timeline |
|
||||
| POST | `/v1/backlog/generate/weekly` | `tools.backlog.admin` | Trigger weekly auto-generation |
|
||||
|
||||
Query params for `/v1/backlog/items`:
|
||||
`env`, `service`, `status`, `owner`, `category`, `due_before`, `limit`, `offset`
|
||||
|
||||
### Tool: `backlog_tool`
|
||||
|
||||
```json
|
||||
{
|
||||
"action": "list|get|dashboard|create|upsert|set_status|add_comment|close|auto_generate_weekly|cleanup",
|
||||
"env": "prod",
|
||||
"id": "bl_abc...",
|
||||
"service": "gateway",
|
||||
"status": "open",
|
||||
"item": { ... },
|
||||
"message": "...",
|
||||
"actor": "cto"
|
||||
}
|
||||
```
|
||||
|
||||
### RBAC
|
||||
|
||||
| Entitlement | Roles | Actions |
|
||||
|--------------------------|------------------|---------|
|
||||
| `tools.backlog.read` | cto, oncall, interface | list, get, dashboard |
|
||||
| `tools.backlog.write` | cto, oncall | create, upsert, set_status, add_comment, close |
|
||||
| `tools.backlog.admin` | cto only | auto_generate_weekly, cleanup |
|
||||
|
||||
---
|
||||
|
||||
## Storage Backends
|
||||
|
||||
| Backend | Env var | Notes |
|
||||
|-----------|------------------------|-------|
|
||||
| `auto` | `BACKLOG_BACKEND=auto` | Postgres → JSONL fallback (default) |
|
||||
| `postgres`| `BACKLOG_BACKEND=postgres` | Primary (requires migration) |
|
||||
| `jsonl` | `BACKLOG_BACKEND=jsonl` | Filesystem append-only (MVP) |
|
||||
| `memory` | `BACKLOG_BACKEND=memory` | Tests only |
|
||||
| `null` | `BACKLOG_BACKEND=null` | No-op |
|
||||
|
||||
Files (JSONL): `ops/backlog/items.jsonl`, `ops/backlog/events.jsonl`
|
||||
|
||||
Postgres: run `ops/scripts/migrate_backlog_postgres.py` first.
|
||||
|
||||
---
|
||||
|
||||
## Scheduled Jobs
|
||||
|
||||
| Job | Schedule | Description |
|
||||
|----------------------------|--------------------|-------------|
|
||||
| `weekly_backlog_generate` | Mon 06:20 UTC | Generate items from latest platform digest |
|
||||
| `daily_backlog_cleanup` | Daily 03:40 UTC | Remove done/canceled items older than retention_days |
|
||||
|
||||
---
|
||||
|
||||
## Examples
|
||||
|
||||
### Manual create via tool
|
||||
|
||||
```json
|
||||
{
|
||||
"action": "create",
|
||||
"env": "prod",
|
||||
"item": {
|
||||
"service": "gateway",
|
||||
"category": "security",
|
||||
"title": "[SEC] Patch CVE-2026-xxxx in gateway",
|
||||
"priority": "P0",
|
||||
"due_date": "2026-03-01",
|
||||
"owner": "cto",
|
||||
"source": "manual",
|
||||
"dedupe_key": "manual:2026-W08:prod:gateway:security"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Close an item
|
||||
|
||||
```json
|
||||
{
|
||||
"action": "close",
|
||||
"id": "bl_abc123456789",
|
||||
"status": "done",
|
||||
"message": "Architecture review completed — no rework needed."
|
||||
}
|
||||
```
|
||||
|
||||
### Run weekly auto-generation
|
||||
|
||||
```bash
|
||||
# HTTP
|
||||
POST /v1/backlog/generate/weekly?env=prod
|
||||
|
||||
# Tool
|
||||
{ "action": "auto_generate_weekly", "env": "prod" }
|
||||
```
|
||||
Reference in New Issue
Block a user