Incident Response#
This runbook covers investigating authorization incidents in the Intended runtime: analyzing audit trails, inspecting decision tokens, activating the emergency kill switch, and conducting post-mortems.
Prerequisites#
- The Intended CLI installed and authenticated with
operator or platform-admin role - Access to the audit log subsystem
- Familiarity with the Trust Model
Danger
During an active incident, prioritize containment over investigation. If AI execution is behaving unexpectedly, activate the kill switch first, then investigate.
Incident Severity Levels#
| Severity | Description | Response Time | Example |
|---|
| Critical | Unauthorized AI execution in production | Immediate | Policy bypass detected |
| High | Authorization failures blocking critical services | < 15 min | Mass denials after deploy |
| Medium | Unexpected decision patterns | < 1 hour | Drift-induced allow/deny flip |
| Low | Audit anomalies, non-blocking | < 4 hours | Missing audit entries |
Step 1: Audit Trail Analysis#
Begin every investigation by querying the audit trail for the affected time window.
$meritt audit query
Query the authorization audit trail.
--startstring *
Start time (ISO 8601 or relative: 1h, 6h, 24h)--endstring
End time (default: now)--identity(-i)string
Filter by caller identity--action(-a)string
Filter by action (deploy, invoke, write, etc.)--result(-r)string
Filter by result: allow, deny, error--environment(-e)string
Filter by environment--format(-f)string
Output format: table, json, csvbash
$ meritt audit query \
--start 2h \
--environment production \
--result deny \
--format table
TIMESTAMP IDENTITY ACTION RESOURCE RESULT POLICY
2026-03-08T10:32:01Z svc:data-pipeline-runner write env:prod/service-b deny restrict-production-writes
2026-03-08T10:31:44Z svc:data-pipeline-runner write env:prod/service-b deny restrict-production-writes
2026-03-08T10:28:12Z svc:batch-processor deploy env:prod/batch-service deny restrict-production-deploys
2026-03-08T10:27:58Z user:alice deploy env:prod/service-a deny restrict-production-deploys
Identify Patterns#
Look for clusters of denials from the same identity, policy, or time window:
bash
$ meritt audit query \
--start 6h \
--environment production \
--result deny \
--format json \
--group-by policy
{
"restrict-production-writes": {
"count": 147,
"firstSeen": "2026-03-08T08:15:00Z",
"lastSeen": "2026-03-08T10:32:01Z",
"uniqueIdentities": 3
},
"restrict-production-deploys": {
"count": 12,
"firstSeen": "2026-03-08T10:27:58Z",
"lastSeen": "2026-03-08T10:28:12Z",
"uniqueIdentities": 2
}
}
Tip
A sudden spike in denials from a single policy shortly after a deployment almost always indicates a policy regression. Cross-reference with meritt deploy history to confirm.
Step 2: Decision Token Inspection#
Every authorization decision produces a cryptographically signed decision token. Inspect it to see the full evaluation chain.
$meritt token inspect <decision-token>
Inspect a decision token to see the full authorization evaluation.
--verifyboolean
Verify the cryptographic signature (default: true)--verbose(-v)boolean
Show full evaluation trace including intermediate stepsbash
$ meritt token inspect dtk_m8x2p4q7 --verbose
DECISION TOKEN INSPECTION
─────────────────────────────────────────────────
Token ID: dtk_m8x2p4q7
Signature: VALID (ES256, signed by runtime-prod-01)
Issued at: 2026-03-08T10:32:01Z
REQUEST:
Identity: svc:data-pipeline-runner
Action: write
Resource: env:prod/service-b
Trust level: 0.85
EVALUATION TRACE:
1. restrict-production-writes/rules[0]
Match: action=write, scope=production → MATCHED
Condition: approval(required=1, from=role:data-lead)
Status: NOT SATISFIED (no approval on record)
Result: DENY
2. restrict-production-writes/fallback
Result: DENY (fallback not reached, rule[0] matched)
FINAL DECISION: DENY
REASON: Approval condition not satisfied
Trace a Chain of Decisions#
For complex incidents involving multiple dependent decisions, trace the full chain:
bash
$ meritt audit trace \
--identity svc:data-pipeline-runner \
--start 1h \
--environment production
DECISION CHAIN for svc:data-pipeline-runner
─────────────────────────────────────────────────
10:28:01 read env:prod/config-store ALLOW (trust-level: 0.85)
10:28:03 invoke env:prod/model-a ALLOW (trust-level: 0.85)
10:32:01 write env:prod/service-b DENY ← first failure
10:32:15 write env:prod/service-b DENY (retry)
10:32:30 write env:prod/service-b DENY (retry)
Step 3: Emergency Kill Switch#
The kill switch immediately suspends all AI execution authorization for a target scope. Use it when you observe unauthorized or dangerous AI behavior.
Danger
The kill switch is a last-resort control. It will deny ALL authorization decisions in the affected scope, including legitimate operations. Use it only when the risk of continued execution outweighs the impact of a full stop.
$meritt emergency kill
Activate the emergency kill switch to suspend all authorization.
--scope(-s)string *
Kill scope: environment, service, or identity pattern--reasonstring *
Reason for activation (recorded in audit trail)--duration(-d)string
Auto-expire duration (default: manual lift required)--notifystring
Notification channels: slack, pagerduty, emailActivate the kill switch
bash
$ meritt emergency kill \
--scope "env:production" \
--reason "Suspected unauthorized AI execution via policy bypass" \
--notify slack,pagerduty
EMERGENCY KILL SWITCH ACTIVATED
─────────────────────────────────────────────────
Scope: env:production (all services, all identities)
Activated by: alice@example.com
Activated at: 2026-03-08T10:45:00Z
Reason: Suspected unauthorized AI execution via policy bypass
Duration: indefinite (manual lift required)
Notifications sent: #incident-response (Slack), PagerDuty
Verify kill switch is active
bash
$ meritt emergency status
ACTIVE KILL SWITCHES:
Scope: env:production
Since: 2026-03-08T10:45:00Z (12 minutes ago)
Activated: alice@example.com
Decisions blocked: 234 since activation
Lift the kill switch
Once the incident is contained and the root cause addressed:
bash
$ meritt emergency lift \
--scope "env:production" \
--reason "Root cause identified and remediated. Policy v3 restored."
Kill switch lifted for env:production
Lifted by: alice@example.com
Lifted at: 2026-03-08T11:30:00Z
Duration: 45 minutes
Step 4: Post-Mortem#
After containment and resolution, conduct a structured post-mortem.
Generate an incident report
bash
$ meritt incident report \
--incident inc_r4t7w2 \
--include audit-trail,decision-tokens,deploy-history \
--format markdown \
--output reports/inc_r4t7w2-postmortem.md
Incident report generated: reports/inc_r4t7w2-postmortem.md
Audit entries included: 1,247
Decision tokens included: 42
Deploy events included: 3
Identify the root cause
Common root causes for authorization incidents:
| Root Cause | Indicators | Resolution |
|---|
| Policy regression | Deny spike after deployment | Rollback, fix policy, redeploy |
| Configuration drift | Runtime differs from repository | Reconcile and redeploy from source |
| Trust level degradation | Decisions flip without policy change | Investigate trust scoring inputs |
| Credential compromise | Unauthorized identity in audit trail | Rotate credentials, tighten scope |
Define remediation actions
bash
$ meritt incident update inc_r4t7w2 \
--status resolved \
--root-cause "Policy v4 introduced approval condition not satisfied by svc:data-pipeline-runner" \
--remediation "Rolled back to v3. Updated v5 with service account exception. Added blast radius check to CI."
Publish and review
Share the post-mortem with the team. Intended tracks incident history for compliance and continuous improvement:
bash
$ meritt incident list --status resolved --limit 5
ID TITLE SEVERITY DURATION RESOLVED
inc_r4t7w2 Policy v4 rollback — deny rate medium 45m 2026-03-08
inc_q3k8m1 Staging drift — allow bypass high 2h 15m 2026-03-01
inc_p2j7n9 Kill switch — batch processor crit 12m 2026-02-22
Incident Response Checklist#
Use this checklist during any authorization incident:
- [ ] Assess severity and notify the on-call team
- [ ] If critical: activate the kill switch immediately
- [ ] Query audit trail for the affected time window
- [ ] Inspect decision tokens for anomalous evaluations
- [ ] Cross-reference with recent deployments
- [ ] Contain: rollback or kill switch
- [ ] Investigate: trace the root cause
- [ ] Remediate: fix policy, rotate credentials, or resolve drift
- [ ] Post-mortem: document findings and actions
- [ ] Follow-up: verify remediation and close the incident
Next Steps#