Incident Management Skill
Use this skill as the gateway for incident triage, SLO monitoring, and notification verification. It orchestrates the full triage workflow - from detection through resolution - and cross-references cx-alerts for deep alert management and cx-telemetry-querying for root cause investigation.
---
CLI Commands
| Command | Subcommands | Purpose |
|---|---|---|
cx incidents | list, get, acknowledge, resolve, close, assign, unassign, events, aggregations | Manage and triage incidents |
cx slos | list, get, create, update, delete | Monitor and manage SLO definitions |
cx alerts | list, get | Check which alerts are firing (see cx-alerts skill for full alert management) |
cx notifications connectors | list, get | Verify notification connector configuration |
cx notifications routers | list, get | Verify notification routing rules |
cx notifications presets | list, get | Check notification preset templates |
cx notifications test | connector, destination, preset, routing-condition, template-render | Test notification delivery |
Key flags:
cx incidents listsupports repeatable filters:--status(TRIGGERED, ACKNOWLEDGED, RESOLVED),--severity(INFO, WARNING, ERROR, CRITICAL),--state(TRIGGERED, RESOLVED),--assignee,--application-name,--subsystem-name,--contextual-label key=value,--query,--muting muted|unmuted,--start/--end, and--duration-start/--duration-endcx incidents listreturns at most 100 incidents per profile by default. Use--limit <n>for a bounded per-profile result set,--page-size <n>/--page-token <token>for manual pagination, or--allonly when you explicitly need every page.- All commands support
-o jsonfor structured output and-p <profile>for profile selection cx slos create/updateuse--from-file <path>(or-for stdin)
---
Incident Triage Workflow
Step 1: Check Active Incidents
cx incidents list -o json
cx incidents list --status TRIGGERED -o json
cx incidents list --severity CRITICAL -o json
cx incidents list --status TRIGGERED --start now-24h --limit 50 -o json
Get an overview of what's happening. Filter by severity for immediate priorities:
cx incidents list --severity CRITICAL --limit 50 -o json | jq '[.[] | {id, name, state, severity, created_at}]'
Step 2: Get Incident Details
cx incidents get <incident-id> -o json
cx incidents events --incident-id <incident-id> -o json
Review the incident timeline and related events to understand scope and progression.
Step 3: Check Related Alerts
cx alerts list -o json
Find which alerts are currently firing. For deep alert inspection, switch to the cx-alerts skill.
cx alerts list -o json | jq '[.[] | select(.is_active == true) | {id, name, severity, last_triggered}]'
Step 4: Review SLO Status
cx slos list -o json
cx slos get <slo-id> -o json
Check if SLOs are breaching or error budgets are burned:
cx slos list -o json | jq '[.[] | {name, status, remaining_budget_percentage}]'
Step 5: Verify Notifications
cx notifications connectors list -o json
cx notifications routers list -o json
cx notifications presets list -o json
Confirm the right people were notified through the correct channels.
Step 6: Pivot to Root Cause
Switch to the cx-telemetry-querying skill to investigate the underlying cause using logs, traces, and metrics.
---
Incident Actions
Acknowledge
cx incidents acknowledge <incident-id>
cx incidents acknowledge <id1> <id2> <id3>
Resolve
cx incidents resolve <incident-id>
cx incidents resolve <id1> <id2> <id3>
Assign
cx incidents assign <incident-id> --user-id <user-id>
Close
cx incidents close <incident-id>
---
SLO Management
Creating SLOs
Template from an existing SLO:
cx slos get <existing-slo-id> -o json > slo-template.json
# Edit slo-template.json with new service/threshold
cx slos create --from-file slo-template.json
Monitoring SLO Health
# All SLOs with their status
cx slos list -o json | jq '[.[] | {name, status, target_percentage, remaining_budget}]'
# SLOs that are breaching
cx slos list -o json | jq '[.[] | select(.status != "OK")]'
---
Notification Debugging
When notifications aren't reaching the right people:
1. Check Connectors
cx notifications connectors list -o json | jq '[.[] | {id, name, type}]'
Verify the expected channels (Slack, PagerDuty, email) exist and are configured.
2. Check Routers
cx notifications routers list -o json | jq '[.[] | {id, name, entity_type}]'
Verify routing rules map the right alert types to the right connectors.
3. Test Notification Delivery
cx notifications test connector --from-file test-connector.json
cx notifications test destination --from-file test-destination.json
cx notifications test preset --from-file test-preset.json
cx notifications test routing-condition --from-file test-condition.json
---
Incident Aggregations
Get a high-level view of incident patterns:
cx incidents aggregations -o json
Use this to understand incident frequency, MTTR trends, and severity distribution.
---
Key Principles
- Triage before deep-dive - check incidents, alerts, and SLOs before querying telemetry data
- Check SLO burn rate, not just status - a slowly burning SLO needs attention before it breaches
- Verify notification chain end-to-end - connector exists → router maps correctly → test delivery works
- Cross-reference with telemetry - use
cx-telemetry-queryingskill for root cause after triage - Acknowledge promptly - acknowledge incidents to signal ownership and stop re-notifications
- Use incident events for timeline -
cx incidents eventsshows the full incident lifecycle
---
Related Skills
cx-alerts- deep alert management: creating, updating, and inspecting alert definitionscx-telemetry-querying- root cause investigation using logs, metrics, traces, and RUMcx-observability-setup- configure notification channels and routing for alerts

