Files
microdao-daarion/docs/supervisor/postmortem_draft_graph.md
Apple 67225a39fa docs(platform): add policy configs, runbooks, ops scripts and platform documentation
Config policies (16 files): alert_routing, architecture_pressure, backlog,
cost_weights, data_governance, incident_escalation, incident_intelligence,
network_allowlist, nodes_registry, observability_sources, rbac_tools_matrix,
release_gate, risk_attribution, risk_policy, slo_policy, tool_limits, tools_rollout

Ops (22 files): Caddyfile, calendar compose, grafana voice dashboard,
deployments/incidents logs, runbooks for alerts/audit/backlog/incidents/sofiia/voice,
cron jobs, scripts (alert_triage, audit_cleanup, migrate_*, governance, schedule),
task_registry, voice alerts/ha/latency/policy

Docs (30+ files): HUMANIZED_STEPAN v2.7-v3 changelogs and runbooks,
NODA1/NODA2 status and setup, audit index and traces, backlog, incident,
supervisor, tools, voice, opencode, release, risk, aistalk, spacebot

Made-with: Cursor
2026-03-03 07:14:53 -08:00

2.8 KiB

Postmortem Draft Graph

Overview

The postmortem_draft_graph is a LangGraph workflow on the Sofiia Supervisor (NODA2) that generates structured postmortem drafts from incident data.

Flow

validate → load_incident → ensure_triage → draft_postmortem
  → attach_artifacts → append_followups → build_result → END
  1. validate — checks incident_id is provided.
  2. load_incident — calls oncall_tool.incident_get via gateway.
  3. ensure_triage — if no triage_report artifact exists, generates one by calling observability/health/KB tools.
  4. draft_postmortem — builds a deterministic markdown + JSON postmortem using a structured template.
  5. attach_artifacts — uploads postmortem_draft.md, postmortem_draft.json (and optionally triage_report.json) via oncall_tool.incident_attach_artifact.
  6. append_followups — creates followup timeline events from the postmortem.
  7. build_result — returns the final output.

API

Start run

curl -X POST http://supervisor:8000/v1/graphs/postmortem_draft/runs \
  -H "Content-Type: application/json" \
  -d '{
    "workspace_id": "default",
    "user_id": "admin",
    "agent_id": "sofiia",
    "input": {
      "incident_id": "inc_20260223_1000_abc123",
      "service": "router",
      "env": "prod",
      "include_traces": false
    }
  }'

Input

Field Type Required Description
incident_id string Yes Existing incident ID
service string No Override service (defaults to incident's service)
env string No Environment (default: prod)
time_range object No {"from": "ISO", "to": "ISO"} (defaults to incident timestamps)
include_traces bool No Include trace lookup in triage (default: false)

Output

{
  "incident_id": "inc_...",
  "artifacts_count": 3,
  "artifacts": [...],
  "followups_count": 4,
  "triage_was_generated": true,
  "markdown_preview": "# Postmortem: Router OOM\n..."
}

Postmortem Template

The generated markdown includes:

  • Summary — from triage report
  • Impact — SLO/health assessment
  • Detection — when/how the incident was reported
  • Timeline — from incident events
  • Root Cause Analysis — from triage suspected causes
  • Mitigations Applied — from triage/runbooks
  • Follow-ups — action items extracted from triage
  • Prevention — standard recommendations

Error Handling

  • Incident not found → graph_status: "failed"
  • Gateway errors during triage generation → non-fatal (uses partial data)
  • Follow-up append errors → non-fatal (graph still succeeds)
  • All tool calls go through gateway (RBAC/audit enforced)

Correlation

Every tool call includes graph_run_id in metadata for full traceability.