Files
microdao-daarion/docs/supervisor/postmortem_draft_graph.md
Apple 67225a39fa docs(platform): add policy configs, runbooks, ops scripts and platform documentation
Config policies (16 files): alert_routing, architecture_pressure, backlog,
cost_weights, data_governance, incident_escalation, incident_intelligence,
network_allowlist, nodes_registry, observability_sources, rbac_tools_matrix,
release_gate, risk_attribution, risk_policy, slo_policy, tool_limits, tools_rollout

Ops (22 files): Caddyfile, calendar compose, grafana voice dashboard,
deployments/incidents logs, runbooks for alerts/audit/backlog/incidents/sofiia/voice,
cron jobs, scripts (alert_triage, audit_cleanup, migrate_*, governance, schedule),
task_registry, voice alerts/ha/latency/policy

Docs (30+ files): HUMANIZED_STEPAN v2.7-v3 changelogs and runbooks,
NODA1/NODA2 status and setup, audit index and traces, backlog, incident,
supervisor, tools, voice, opencode, release, risk, aistalk, spacebot

Made-with: Cursor
2026-03-03 07:14:53 -08:00

88 lines
2.8 KiB
Markdown

# Postmortem Draft Graph
## Overview
The `postmortem_draft_graph` is a LangGraph workflow on the Sofiia Supervisor (NODA2) that generates structured postmortem drafts from incident data.
## Flow
```
validate → load_incident → ensure_triage → draft_postmortem
→ attach_artifacts → append_followups → build_result → END
```
1. **validate** — checks `incident_id` is provided.
2. **load_incident** — calls `oncall_tool.incident_get` via gateway.
3. **ensure_triage** — if no `triage_report` artifact exists, generates one by calling observability/health/KB tools.
4. **draft_postmortem** — builds a deterministic markdown + JSON postmortem using a structured template.
5. **attach_artifacts** — uploads `postmortem_draft.md`, `postmortem_draft.json` (and optionally `triage_report.json`) via `oncall_tool.incident_attach_artifact`.
6. **append_followups** — creates `followup` timeline events from the postmortem.
7. **build_result** — returns the final output.
## API
### Start run
```bash
curl -X POST http://supervisor:8000/v1/graphs/postmortem_draft/runs \
-H "Content-Type: application/json" \
-d '{
"workspace_id": "default",
"user_id": "admin",
"agent_id": "sofiia",
"input": {
"incident_id": "inc_20260223_1000_abc123",
"service": "router",
"env": "prod",
"include_traces": false
}
}'
```
### Input
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| incident_id | string | Yes | Existing incident ID |
| service | string | No | Override service (defaults to incident's service) |
| env | string | No | Environment (default: prod) |
| time_range | object | No | `{"from": "ISO", "to": "ISO"}` (defaults to incident timestamps) |
| include_traces | bool | No | Include trace lookup in triage (default: false) |
### Output
```json
{
"incident_id": "inc_...",
"artifacts_count": 3,
"artifacts": [...],
"followups_count": 4,
"triage_was_generated": true,
"markdown_preview": "# Postmortem: Router OOM\n..."
}
```
## Postmortem Template
The generated markdown includes:
- **Summary** — from triage report
- **Impact** — SLO/health assessment
- **Detection** — when/how the incident was reported
- **Timeline** — from incident events
- **Root Cause Analysis** — from triage suspected causes
- **Mitigations Applied** — from triage/runbooks
- **Follow-ups** — action items extracted from triage
- **Prevention** — standard recommendations
## Error Handling
- Incident not found → `graph_status: "failed"`
- Gateway errors during triage generation → non-fatal (uses partial data)
- Follow-up append errors → non-fatal (graph still succeeds)
- All tool calls go through gateway (RBAC/audit enforced)
## Correlation
Every tool call includes `graph_run_id` in metadata for full traceability.