# Oncall/Runbook Tool - Documentation ## Overview Oncall Tool provides operational information: services catalog, health checks, deployments, runbooks, and incident tracking. Read-only for most agents, with gated write for. ## Integration incidents ### Tool Definition Registered in `services/router/tool_manager.py`: ```python { "type": "function", "function": { "name": "oncall_tool", "description": "📋 Операційна інформація...", "parameters": {...} } } ``` ### RBAC Configuration Added to `FULL_STANDARD_STACK` in `services/router/agent_tools_config.py`. ## Actions ### 1. services_list List all services from docker-compose files and service catalogs. ```json { "action": "services_list" } ``` **Response:** ```json { "services": [ {"name": "router", "source": "docker-compose.yml", "type": "service", "criticality": "medium"}, {"name": "gateway", "source": "docker-compose.yml", "type": "service", "criticality": "high"} ], "count": 2 } ``` ### 2. service_health Check health endpoint of a service. ```json { "action": "service_health", "params": { "service_name": "router", "health_endpoint": "http://router-service:8000/health" } } ``` **Security:** Only allowlisted internal hosts can be checked. **Allowlist:** `localhost`, `127.0.0.1`, `router-service`, `gateway-service`, `memory-service`, `swapper-service`, `crewai-service` **Response:** ```json { "service": "router", "endpoint": "http://router-service:8000/health", "status": "healthy", "status_code": 200, "latency_ms": 15 } ``` ### 3. service_status Get service status and version info. ```json { "action": "service_status", "params": { "service_name": "router" } } ``` ### 4. deployments_recent Get recent deployments from log file or git. ```json { "action": "deployments_recent" } ``` **Sources (priority):** 1. `ops/deployments.jsonl` 2. Git commit history (fallback) **Response:** ```json { "deployments": [ {"ts": "2024-01-15T10:00:00", "service": "router", "version": "1.2.0"}, {"type": "git_commit", "commit": "abc123 Fix bug"} ], "count": 2 } ``` ### 5. runbook_search Search for runbooks. ```json { "action": "runbook_search", "params": { "query": "deployment" } } ``` **Search directories:** `ops/`, `runbooks/`, `docs/runbooks/`, `docs/ops/` **Response:** ```json { "results": [ {"path": "ops/deploy.md", "file": "deploy.md"} ], "query": "deployment", "count": 1 } ``` ### 6. runbook_read Read a specific runbook. ```json { "action": "runbook_read", "params": { "runbook_path": "ops/deploy.md" } } ``` **Security:** - Only reads from allowlisted directories - Path traversal blocked - Secrets masked in content - Max 200KB per read **Response:** ```json { "path": "ops/deploy.md", "content": "# Deployment Runbook\n\n...", "size": 1234 } ``` ### 7. incident_log_list List incidents. ```json { "action": "incident_log_list", "params": { "severity": "sev1", "limit": 20 } } ``` **Response:** ```json { "incidents": [ { "ts": "2024-01-15T10:00:00", "severity": "sev1", "title": "Router down", "service": "router" } ], "count": 1 } ``` ### 8. incident_log_append Add new incident (gated - requires entitlement). ```json { "action": "incident_log_append", "params": { "service_name": "router", "incident_title": "High latency", "incident_severity": "sev2", "incident_details": "Router experiencing 500ms latency", "incident_tags": ["performance", "router"] } } ``` **RBAC:** Only `sofiia`, `helion`, `admin` can add incidents. **Storage:** `ops/incidents.jsonl` **Response:** ```json { "incident_id": "2024-01-15T10:00:00", "status": "logged" } ``` ## Security Features ### Health Check Allowlist Only internal service endpoints can be checked: - `localhost`, `127.0.0.1` - Service names: `router-service`, `gateway-service`, `memory-service`, `swapper-service`, `crewai-service` ### Runbook Security - Only read from allowlisted directories: `ops/`, `runbooks/`, `docs/runbooks/`, `docs/ops/` - Path traversal blocked - Secrets automatically masked ### RBAC - Read actions: `tools.oncall.read` (default for all agents) - Write incidents: `tools.oncall.incident_write` (only sofiia, helion, admin) ## Data Files Created empty files for data storage: - `ops/incidents.jsonl` - Incident log - `ops/deployments.jsonl` - Deployment log ## Example Usage ### Check Service Health ``` "Перевіри health router сервісу" ``` ### Find Runbook ``` "Знайди runbook про деплой" ``` ### Read Deployment Runbook ``` "Відкрий runbook/deploy.md" ``` ### View Recent Deployments ``` "Покажи останні деплої" ``` ### Log Incident ``` "Зареєструй інцидент: router висока затримка, sev2" ``` ## Testing ```bash pytest tools/oncall_tool/tests/test_oncall_tool.py -v ``` Test coverage: - services_list parses docker-compose - runbook_search finds results - runbook_read blocks path traversal - runbook_read masks secrets - incident_log_append allowed for sofiia - incident_log_append blocked for regular agents - service_health blocks non-allowlisted hosts