2026-03-17

Fail-Closed vs. Fail-Open: Why Your AI Authorization Model Matters

Intended Team · Founding Team

The Question Nobody Asks Until It Is Too Late

Your AI agents are humming along. Thousands of decisions per hour, all evaluated against your policies, all logged in your audit trail. Then the authorization system goes down. Maybe it is a network partition. Maybe it is a deployment gone wrong. Maybe it is a sudden traffic spike that overwhelms the evaluation service.

What happens next?

In a fail-open system, the answer is: everything keeps running. AI agents continue executing actions without authorization. No policy evaluation. No risk scoring. No audit trail. The system assumes that the inability to check means the action should proceed. This is the default in most permission systems because it optimizes for availability -- human users should not be locked out of their tools because of a transient infrastructure issue.

In a fail-closed system, the answer is: everything stops. AI agents cannot execute actions without authorization. No evaluation means no execution. The system assumes that the inability to check means the action should not proceed. This is a fundamentally different philosophy -- it optimizes for safety over availability.

For human users, fail-open is often the right trade-off. A developer locked out of their IDE for five minutes is a productivity loss. The developer is not going to execute a thousand unauthorized actions during that window. They will wait, retry, or work on something else.

For AI agents, fail-open is dangerous. An AI agent that loses its authorization check will not wait. It will continue executing at full speed. Five minutes of uncontrolled execution from a fleet of AI agents can mean thousands of unauthorized actions -- deployments, data changes, financial transactions, access modifications -- with no policy evaluation, no risk scoring, and no audit trail.

The Anatomy of a Fail-Open Incident

Consider a real-world scenario. An enterprise runs 20 AI agents across their software development and infrastructure operations. Each agent processes roughly 100 actions per hour. The authorization service experiences a 15-minute outage due to a database connection pool exhaustion.

In a fail-open system, during those 15 minutes, the 20 agents execute approximately 500 actions without authorization. Among those actions: three production deployments, two infrastructure scaling events, one database migration, and a dozen access changes. None were evaluated against policies. None were risk-scored. None produced authority tokens. None were recorded in the audit chain.

The outage is resolved. The authorization service comes back online. But the damage is done -- 500 unaudited actions are now in production, and the compliance team has a 15-minute gap in their authority record. Were those actions appropriate? Nobody knows. The evidence does not exist.

In a fail-closed system, those 500 actions would have been blocked. The AI agents would have received denial responses and queued their actions for retry. When the authorization service recovered, the queue would drain, each action would be properly evaluated, and the audit chain would remain unbroken.

Why Fail-Closed Is Hard

If fail-closed is clearly safer, why does anyone build fail-open systems? Because fail-closed is harder to operate. It requires:

High availability of the authorization service

If the authorization service is a single point of failure, fail-closed means your entire AI agent fleet stops when it goes down. Intended addresses this with multi-region deployment, local policy caches, and a decision evaluation path that operates with sub-5ms latency at the 99th percentile. The system is designed so that the authorization check is never the bottleneck.

Graceful degradation patterns

Not all actions carry equal urgency. A routine log rotation can wait for the authorization service to recover. A critical incident response cannot. Intended supports break-glass policies that allow specific, pre-authorized action categories to proceed during degraded operations -- but with elevated logging, mandatory post-hoc review, and automatic audit chain reconciliation.

Queue and retry infrastructure

When actions are blocked, they need somewhere to go. Intended maintains an intent queue that holds pending actions during authorization service disruptions. When the service recovers, the queue drains in priority order, with each action receiving full policy evaluation and risk scoring.

Clear communication to operators

When AI agents are blocked, humans need to know. Intended fires escalation notifications when the fail-closed circuit trips, giving operators visibility into what is being blocked and why. The console shows queued actions in real-time, so operators can intervene if a critical action needs manual approval during an outage.

The Architecture of Fail-Closed

Intended implements fail-closed at three boundaries:

The intent boundary

Every action starts as an intent submission. If the intent cannot be classified -- because the classification service is unavailable or the action does not match any known category -- the intent is rejected. Unknown actions do not proceed. This is the first gate, and it is fail-closed by default.

The evaluation boundary

Every classified intent is evaluated against policies. If the policy engine cannot reach the policy store, or if the evaluation exceeds the timeout threshold, the decision is DENY. There is no implicit allow. A policy engine that cannot evaluate returns a denial with a specific reason code (EVALUATION_UNAVAILABLE) that distinguishes it from a policy-based denial.

The token boundary

Every approved action receives a signed authority token. If the token signing service is unavailable -- the key service is down, the HSM is unreachable, or the signing operation fails -- the action is blocked. An approved decision without a valid token is not actionable. The execution layer requires a cryptographically valid token to proceed. No token, no execution.

What About Performance?

The common objection to fail-closed is performance. If every action requires an authorization check, and every check must complete before the action can proceed, does that not add latency to every operation?

It does. But the latency is small and predictable. Intended evaluates decisions in under 5ms at the 99th percentile. For most AI agent operations, that latency is invisible -- the downstream action (an API call, a database write, a deployment command) takes orders of magnitude longer than the authorization check.

For high-throughput scenarios where even 5ms matters, Intended supports pre-authorization patterns. An AI agent can submit a batch of intents and receive authority tokens before beginning execution. The tokens are valid for a configurable window, allowing the agent to execute pre-authorized actions without per-action latency.

The Compliance Argument

Beyond safety, fail-closed has a compliance advantage. Regulatory frameworks -- SOC 2, HIPAA, FedRAMP, GDPR -- require organizations to demonstrate that access controls are consistently enforced. A fail-open system has gaps by design. When the authorization service is unavailable, actions proceed without control. Those gaps are audit findings.

A fail-closed system has no gaps. Every action is either authorized and proven, or blocked and logged. The audit chain is continuous. There are no windows of uncontrolled execution to explain to auditors.

For enterprises operating in regulated industries, fail-closed is not a preference. It is a requirement. Intended's architecture ensures that the audit chain is never broken, even during infrastructure disruptions.

Making the Choice

If your AI agents operate in an environment where unauthorized actions have consequences -- financial, operational, legal, or reputational -- fail-closed is the only responsible default. The question is not whether you can afford the operational complexity of fail-closed. The question is whether you can afford the risk of fail-open.

Intended is fail-closed at every boundary because AI agents do not have judgment. They do not pause when something feels wrong. They do not escalate when they are uncertain. They execute. And a system that governs execution must ensure that no execution occurs without authorization -- especially when the authorization system itself is under stress.

See how Intended implements fail-closed authorization across every boundary. Start with the free tier and test it against your own failure scenarios.