docs(platform): add policy configs, runbooks, ops scripts and platform documentation
Config policies (16 files): alert_routing, architecture_pressure, backlog, cost_weights, data_governance, incident_escalation, incident_intelligence, network_allowlist, nodes_registry, observability_sources, rbac_tools_matrix, release_gate, risk_attribution, risk_policy, slo_policy, tool_limits, tools_rollout Ops (22 files): Caddyfile, calendar compose, grafana voice dashboard, deployments/incidents logs, runbooks for alerts/audit/backlog/incidents/sofiia/voice, cron jobs, scripts (alert_triage, audit_cleanup, migrate_*, governance, schedule), task_registry, voice alerts/ha/latency/policy Docs (30+ files): HUMANIZED_STEPAN v2.7-v3 changelogs and runbooks, NODA1/NODA2 status and setup, audit index and traces, backlog, incident, supervisor, tools, voice, opencode, release, risk, aistalk, spacebot Made-with: Cursor
This commit is contained in:
102
docs/incident/followups.md
Normal file
102
docs/incident/followups.md
Normal file
@@ -0,0 +1,102 @@
|
||||
# Follow-up Tracker & Release Gate
|
||||
|
||||
## Overview
|
||||
|
||||
Follow-ups are structured action items attached to incidents via `incident_append_event` with `type=followup`. The `followup_watch` gate in `release_check` uses them to block or warn about releases for services with unresolved issues.
|
||||
|
||||
## Follow-up Event Schema
|
||||
|
||||
When appending a follow-up event to an incident:
|
||||
|
||||
```json
|
||||
{
|
||||
"action": "incident_append_event",
|
||||
"incident_id": "inc_20250123_0900_abc1",
|
||||
"type": "followup",
|
||||
"message": "Upgrade postgres driver",
|
||||
"meta": {
|
||||
"title": "Upgrade postgres driver to fix connection leak",
|
||||
"owner": "sofiia",
|
||||
"priority": "P1",
|
||||
"due_date": "2025-02-01T00:00:00Z",
|
||||
"status": "open",
|
||||
"links": ["https://github.com/org/repo/issues/42"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Meta Fields
|
||||
|
||||
| Field | Type | Required | Description |
|
||||
|-------|------|----------|-------------|
|
||||
| `title` | string | yes | Short description |
|
||||
| `owner` | string | yes | Agent ID or handle |
|
||||
| `priority` | enum | yes | P0, P1, P2, P3 |
|
||||
| `due_date` | ISO8601 | yes | Deadline |
|
||||
| `status` | enum | yes | open, done, cancelled |
|
||||
| `links` | array | no | Related PRs/issues/ADRs |
|
||||
|
||||
## oncall_tool: incident_followups_summary
|
||||
|
||||
Summarises open incidents and overdue follow-ups for a service.
|
||||
|
||||
### Request
|
||||
|
||||
```json
|
||||
{
|
||||
"action": "incident_followups_summary",
|
||||
"service": "gateway",
|
||||
"env": "prod",
|
||||
"window_days": 30
|
||||
}
|
||||
```
|
||||
|
||||
### Response
|
||||
|
||||
```json
|
||||
{
|
||||
"open_incidents": [
|
||||
{"id": "inc_...", "severity": "P1", "status": "open", "started_at": "...", "title": "..."}
|
||||
],
|
||||
"overdue_followups": [
|
||||
{"incident_id": "inc_...", "title": "...", "due_date": "...", "priority": "P1", "owner": "sofiia"}
|
||||
],
|
||||
"stats": {
|
||||
"open_incidents": 1,
|
||||
"overdue": 1,
|
||||
"total_open_followups": 3
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Release Gate: followup_watch
|
||||
|
||||
### Behaviour per GatePolicy mode
|
||||
|
||||
| Mode | Behaviour |
|
||||
|------|-----------|
|
||||
| `off` | Gate skipped entirely |
|
||||
| `warn` | Always pass=True; adds recommendations for open P0/P1 and overdue follow-ups |
|
||||
| `strict` | Blocks release (`pass=false`) if open incidents match `fail_on` severities or overdue follow-ups exist |
|
||||
|
||||
### Configuration
|
||||
|
||||
In `config/release_gate_policy.yml`:
|
||||
|
||||
```yaml
|
||||
followup_watch:
|
||||
mode: "warn" # off | warn | strict
|
||||
fail_on: ["P0", "P1"] # Severities that block in strict mode
|
||||
```
|
||||
|
||||
### release_check inputs
|
||||
|
||||
| Input | Type | Default | Description |
|
||||
|-------|------|---------|-------------|
|
||||
| `run_followup_watch` | bool | true | Enable/disable gate |
|
||||
| `followup_watch_window_days` | int | 30 | Incident scan window |
|
||||
| `followup_watch_env` | string | "any" | Filter by environment |
|
||||
|
||||
## RBAC
|
||||
|
||||
`incident_followups_summary` requires `tools.oncall.read` entitlement.
|
||||
Reference in New Issue
Block a user