microdao-daarion/docs/supervisor/postmortem_draft_graph.md

# Postmortem Draft Graph

## Overview

The `postmortem_draft_graph` is a LangGraph workflow on the Sofiia Supervisor (NODA2) that generates structured postmortem drafts from incident data.

## Flow

```
validate → load_incident → ensure_triage → draft_postmortem
  → attach_artifacts → append_followups → build_result → END
```

1. **validate** — checks `incident_id` is provided.
2. **load_incident** — calls `oncall_tool.incident_get` via gateway.
3. **ensure_triage** — if no `triage_report` artifact exists, generates one by calling observability/health/KB tools.
4. **draft_postmortem** — builds a deterministic markdown + JSON postmortem using a structured template.
5. **attach_artifacts** — uploads `postmortem_draft.md`, `postmortem_draft.json` (and optionally `triage_report.json`) via `oncall_tool.incident_attach_artifact`.
6. **append_followups** — creates `followup` timeline events from the postmortem.
7. **build_result** — returns the final output.

## API

### Start run

```bash
curl -X POST http://supervisor:8000/v1/graphs/postmortem_draft/runs \
  -H "Content-Type: application/json" \
  -d '{
    "workspace_id": "default",
    "user_id": "admin",
    "agent_id": "sofiia",
    "input": {
      "incident_id": "inc_20260223_1000_abc123",
      "service": "router",
      "env": "prod",
      "include_traces": false
    }
  }'
```

### Input

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| incident_id | string | Yes | Existing incident ID |
| service | string | No | Override service (defaults to incident's service) |
| env | string | No | Environment (default: prod) |
| time_range | object | No | `{"from": "ISO", "to": "ISO"}` (defaults to incident timestamps) |
| include_traces | bool | No | Include trace lookup in triage (default: false) |

### Output

```json
{
  "incident_id": "inc_...",
  "artifacts_count": 3,
  "artifacts": [...],
  "followups_count": 4,
  "triage_was_generated": true,
  "markdown_preview": "# Postmortem: Router OOM\n..."
}
```

## Postmortem Template

The generated markdown includes:

- **Summary** — from triage report
- **Impact** — SLO/health assessment
- **Detection** — when/how the incident was reported
- **Timeline** — from incident events
- **Root Cause Analysis** — from triage suspected causes
- **Mitigations Applied** — from triage/runbooks
- **Follow-ups** — action items extracted from triage
- **Prevention** — standard recommendations

## Error Handling

- Incident not found → `graph_status: "failed"`
- Gateway errors during triage generation → non-fatal (uses partial data)
- Follow-up append errors → non-fatal (graph still succeeds)
- All tool calls go through gateway (RBAC/audit enforced)

## Correlation

Every tool call includes `graph_run_id` in metadata for full traceability.