# Postmortem Draft Graph ## Overview The `postmortem_draft_graph` is a LangGraph workflow on the Sofiia Supervisor (NODA2) that generates structured postmortem drafts from incident data. ## Flow ``` validate → load_incident → ensure_triage → draft_postmortem → attach_artifacts → append_followups → build_result → END ``` 1. **validate** — checks `incident_id` is provided. 2. **load_incident** — calls `oncall_tool.incident_get` via gateway. 3. **ensure_triage** — if no `triage_report` artifact exists, generates one by calling observability/health/KB tools. 4. **draft_postmortem** — builds a deterministic markdown + JSON postmortem using a structured template. 5. **attach_artifacts** — uploads `postmortem_draft.md`, `postmortem_draft.json` (and optionally `triage_report.json`) via `oncall_tool.incident_attach_artifact`. 6. **append_followups** — creates `followup` timeline events from the postmortem. 7. **build_result** — returns the final output. ## API ### Start run ```bash curl -X POST http://supervisor:8000/v1/graphs/postmortem_draft/runs \ -H "Content-Type: application/json" \ -d '{ "workspace_id": "default", "user_id": "admin", "agent_id": "sofiia", "input": { "incident_id": "inc_20260223_1000_abc123", "service": "router", "env": "prod", "include_traces": false } }' ``` ### Input | Field | Type | Required | Description | |-------|------|----------|-------------| | incident_id | string | Yes | Existing incident ID | | service | string | No | Override service (defaults to incident's service) | | env | string | No | Environment (default: prod) | | time_range | object | No | `{"from": "ISO", "to": "ISO"}` (defaults to incident timestamps) | | include_traces | bool | No | Include trace lookup in triage (default: false) | ### Output ```json { "incident_id": "inc_...", "artifacts_count": 3, "artifacts": [...], "followups_count": 4, "triage_was_generated": true, "markdown_preview": "# Postmortem: Router OOM\n..." } ``` ## Postmortem Template The generated markdown includes: - **Summary** — from triage report - **Impact** — SLO/health assessment - **Detection** — when/how the incident was reported - **Timeline** — from incident events - **Root Cause Analysis** — from triage suspected causes - **Mitigations Applied** — from triage/runbooks - **Follow-ups** — action items extracted from triage - **Prevention** — standard recommendations ## Error Handling - Incident not found → `graph_status: "failed"` - Gateway errors during triage generation → non-fatal (uses partial data) - Follow-up append errors → non-fatal (graph still succeeds) - All tool calls go through gateway (RBAC/audit enforced) ## Correlation Every tool call includes `graph_run_id` in metadata for full traceability.