Config policies (16 files): alert_routing, architecture_pressure, backlog, cost_weights, data_governance, incident_escalation, incident_intelligence, network_allowlist, nodes_registry, observability_sources, rbac_tools_matrix, release_gate, risk_attribution, risk_policy, slo_policy, tool_limits, tools_rollout Ops (22 files): Caddyfile, calendar compose, grafana voice dashboard, deployments/incidents logs, runbooks for alerts/audit/backlog/incidents/sofiia/voice, cron jobs, scripts (alert_triage, audit_cleanup, migrate_*, governance, schedule), task_registry, voice alerts/ha/latency/policy Docs (30+ files): HUMANIZED_STEPAN v2.7-v3 changelogs and runbooks, NODA1/NODA2 status and setup, audit index and traces, backlog, incident, supervisor, tools, voice, opencode, release, risk, aistalk, spacebot Made-with: Cursor
2.8 KiB
2.8 KiB
Postmortem Draft Graph
Overview
The postmortem_draft_graph is a LangGraph workflow on the Sofiia Supervisor (NODA2) that generates structured postmortem drafts from incident data.
Flow
validate → load_incident → ensure_triage → draft_postmortem
→ attach_artifacts → append_followups → build_result → END
- validate — checks
incident_idis provided. - load_incident — calls
oncall_tool.incident_getvia gateway. - ensure_triage — if no
triage_reportartifact exists, generates one by calling observability/health/KB tools. - draft_postmortem — builds a deterministic markdown + JSON postmortem using a structured template.
- attach_artifacts — uploads
postmortem_draft.md,postmortem_draft.json(and optionallytriage_report.json) viaoncall_tool.incident_attach_artifact. - append_followups — creates
followuptimeline events from the postmortem. - build_result — returns the final output.
API
Start run
curl -X POST http://supervisor:8000/v1/graphs/postmortem_draft/runs \
-H "Content-Type: application/json" \
-d '{
"workspace_id": "default",
"user_id": "admin",
"agent_id": "sofiia",
"input": {
"incident_id": "inc_20260223_1000_abc123",
"service": "router",
"env": "prod",
"include_traces": false
}
}'
Input
| Field | Type | Required | Description |
|---|---|---|---|
| incident_id | string | Yes | Existing incident ID |
| service | string | No | Override service (defaults to incident's service) |
| env | string | No | Environment (default: prod) |
| time_range | object | No | {"from": "ISO", "to": "ISO"} (defaults to incident timestamps) |
| include_traces | bool | No | Include trace lookup in triage (default: false) |
Output
{
"incident_id": "inc_...",
"artifacts_count": 3,
"artifacts": [...],
"followups_count": 4,
"triage_was_generated": true,
"markdown_preview": "# Postmortem: Router OOM\n..."
}
Postmortem Template
The generated markdown includes:
- Summary — from triage report
- Impact — SLO/health assessment
- Detection — when/how the incident was reported
- Timeline — from incident events
- Root Cause Analysis — from triage suspected causes
- Mitigations Applied — from triage/runbooks
- Follow-ups — action items extracted from triage
- Prevention — standard recommendations
Error Handling
- Incident not found →
graph_status: "failed" - Gateway errors during triage generation → non-fatal (uses partial data)
- Follow-up append errors → non-fatal (graph still succeeds)
- All tool calls go through gateway (RBAC/audit enforced)
Correlation
Every tool call includes graph_run_id in metadata for full traceability.