Skip to content

tutorials

Intended Documentation

Go-Live Runbook

Production rollout checklist for Intended deployment — required controls, validation steps, and incident readiness.

Overview#

This runbook provides the complete production rollout checklist for Intended. Follow these steps in order before enabling production traffic.

Danger

Do not skip validation steps. The fail-closed security model means misconfiguration results in denied traffic, not silent failures.

Pre-Deployment Validation#

Verify infrastructure health

Confirm all runtime services are healthy:

bash
meritt health check --verbose --environment production

Expected output: all services report healthy. If any service reports degraded or unavailable, resolve before proceeding.

Validate policy set

Ensure the production policy set is validated and deployed:

bash
meritt policy validate --all --environment production
meritt policy list --environment production --status active

Confirm:

  • All policies pass validation
  • Active policy count matches expected
  • No stale or orphaned policies

Run operational readiness checks

bash
meritt readiness check --environment production --verbose

This runs the full readiness suite:

  • Service connectivity
  • Key material availability
  • Audit pipeline lag
  • Policy engine response time
  • Token signing latency

Verify token signing

Issue a test token and verify it:

bash
meritt token test --environment production

This submits a synthetic intent, receives a decision token, and verifies its signature locally.

Confirm audit pipeline

Verify audit events are being recorded:

bash
curl -H "Authorization: Bearer $Intended_API_KEY" \
  "https://api.intended.so/tenants/tenant_acme_prod/audit?limit=1"

Confirm the response returns recent events with correct timestamps.

Required Controls#

Before routing production traffic, confirm these controls are in place:

Emergency Controls#

  • [ ] Tenant-wide kill switch tested and accessible
  • [ ] Emergency token revocation procedure documented and rehearsed
  • [ ] Circuit breaker thresholds configured for production load

Access Controls#

  • [ ] Production API keys created with minimum required scopes
  • [ ] Development/staging keys do not have production access
  • [ ] Role assignments reviewed for least-privilege compliance

Monitoring#

  • [ ] Health endpoint monitoring configured (healthz, readyz)
  • [ ] Alerting configured for evaluation latency thresholds
  • [ ] Alerting configured for error rate spikes
  • [ ] Audit pipeline lag alerting configured

Rollback Plan#

  • [ ] Previous policy version identified for rollback
  • [ ] Rollback procedure tested in staging
  • [ ] Rollback authorization chain identified (who can approve)

Production Rollout Order#

Enable at low traffic

Route a small percentage of traffic (5-10%) to the Intended evaluation pipeline. Monitor:

  • Decision latency (p50, p95, p99)
  • Error rate
  • Allow/deny ratio

Validate steady state

After 30 minutes at low traffic:

  • Confirm latency is within expected bounds
  • Confirm error rate is below threshold
  • Review a sample of deny decisions for correctness

Increase to full traffic

Gradually increase to 100% over 2-4 hours:

  • 5% → 25% → 50% → 100%
  • Monitor at each step before increasing

Post-launch validation

After 24 hours at full traffic:

  • Run the full readiness check suite again
  • Review audit log volume and completeness
  • Confirm no unexpected deny patterns
  • Archive the go-live evidence for compliance

Incident Readiness#

Before go-live, ensure the following are in place:

  • On-call rotation — at least one operator available 24/7 for the first week
  • Incident response runbook — reviewed and accessible (Incident Response)
  • Communication channel — dedicated channel for platform incidents
  • Escalation path — defined escalation from operator → engineering → leadership

Rollback Procedure#

If issues are detected after go-live:

bash
# Activate kill switch (stops all evaluations, defaults to deny)
meritt emergency kill-switch activate --tenant $TENANT_ID --reason "go-live rollback"

# Roll back to previous policy version
meritt policy rollback --to-version $PREVIOUS_VERSION --environment production

# Verify rollback
meritt policy list --environment production --status active

# Deactivate kill switch when stable
meritt emergency kill-switch deactivate --tenant $TENANT_ID

Next Steps#