operator runbooks
Intended Documentation
Control Center Operations
Monitor and manage the authority runtime through the control center dashboard.
Implemented Capability
The control center dashboard is live in the console at /account/authority/control. It aggregates runtime pressure, graph health, anomalies, and auditable intervention actions.
Overview#
The control center provides a unified operational dashboard for the Intended authority runtime. It surfaces real-time metrics, policy deployment status, decision throughput, and system health in a single view.
Dashboard Sections#
Policy Status#
The policy status panel shows:
- Active policy count and version
- Last deployment timestamp
- Pending approvals
- Policy evaluation rate (decisions/second)
Decision Throughput#
Real-time metrics for intent evaluations:
- Total evaluations in the current window
- Allow/deny/require-approval breakdown
- Average evaluation latency (p50, p95, p99)
- Error rate
System Health#
Health indicators for runtime components:
- Policy engine availability
- Token signing service status
- Audit pipeline lag
- Connector health per integration
Alerting Configuration#
Alerting policy management remains API-driven, but control center now surfaces live operational pressure and links directly to interventions and drilldowns.
Planned Alert Types#
- Evaluation latency threshold — triggers when p95 latency exceeds a configured threshold
- Error rate spike — triggers when error rate exceeds baseline by a configurable percentage
- Policy deployment failure — triggers on failed deployment attempts
- Token signing degradation — triggers when signing service response time degrades
- Audit pipeline lag — triggers when audit events fall behind by a configurable duration
Supporting Monitoring Alternatives#
Control center is the primary surface. These alternatives remain available for automation and external observability pipelines:
CLI Health Check#
API Health Endpoints#
Metrics API Response#
Next Steps#
- Incident Response — investigate issues using available tooling
- Operational Readiness — validate runtime health before go-live
- Deploy and Rollback — manage policy deployments