Skip to content

2026-03-07

How Intended Processes a Decision in Under 50ms

Intended Team · Engineering

How Intended Processes a Decision in Under 50ms

When an AI agent makes a tool call through Intended, the authority decision completes in under 50 milliseconds at p99. That number matters because governance that slows down agents does not get adopted. If the authority check takes 500ms, teams will find ways to bypass it. If it takes 50ms, it disappears into the noise of network latency and tool execution time.

This post walks through the decision pipeline stage by stage and explains the architecture choices that make sub-50ms possible.

The Pipeline

Every authority decision flows through five stages: intake, classification, risk scoring, policy evaluation, and token issuance. Each stage is optimized independently, and the pipeline is designed so that no stage blocks on external I/O in the hot path.

Stage 1: Intake (2-3ms)

The intake stage receives the intent submission from the SDK or gateway. The payload contains the agent identity, the action being attempted, the target resource, the action parameters, and optional context metadata.

The intake handler validates the payload schema, authenticates the API key against a local cache of key hashes, and assigns a decision ID. Authentication is a hash comparison, not a database lookup. Key hashes are synced to each evaluation node every 30 seconds. A cache miss falls back to the database, but that path is rare because API keys change infrequently.

The intake handler runs on the same node that will handle the rest of the pipeline. There is no internal routing or queue between stages. The decision is processed in a single function call chain on a single thread.

Stage 2: Classification (3-5ms)

The classification stage maps the raw action to a canonical intent in the Intended Intent Registry (MIR). The mapping uses a trie-based lookup structure that resolves tool names and action patterns to MIR categories.

For built-in connectors like GitHub, Jira, Salesforce, and AWS, the mappings are pre-compiled and loaded into memory at startup. A tool call like "github.createPullRequest" resolves directly to MIR-100 (Software Development), subcategory "pull-request-create." The lookup is a trie traversal, not a string comparison against a list.

For custom tools, organizations define mappings in their configuration. These mappings are compiled into the same trie structure when the configuration is loaded. Custom mappings have the same lookup performance as built-in mappings.

The classification stage also extracts risk-relevant parameters from the action payload. For a financial transaction, this includes the amount, currency, and recipient. For a deployment, this includes the target environment and change type. These extracted parameters feed into risk scoring.

Stage 3: Risk Scoring (5-8ms)

The risk scoring stage evaluates the intent across eight dimensions: financial impact, data sensitivity, operational risk, compliance exposure, reversibility, blast radius, velocity, and privilege level. Each dimension produces a score between 0 and 1.

The scoring functions are pure computations with no external dependencies. Financial impact is computed from the transaction amount relative to configured thresholds. Velocity is computed from a counter stored in a local in-memory data structure, not a database. The counter tracks actions per agent per intent category over sliding time windows.

The velocity tracking deserves special attention because it is the most latency-sensitive component. Each evaluation node maintains a local HyperLogLog-style counter per agent-intent pair. The counters use a ring buffer with second-granularity buckets. Querying "how many times has this agent called this intent type in the last 60 seconds" is a sum over 60 integer buckets. There is no lock contention because each evaluation thread has its own counter set.

Cross-node velocity aggregation happens asynchronously. Each node publishes its counters to a shared stream every 5 seconds. The aggregated counts lag by up to 5 seconds, which means velocity detection has a brief blind spot when an agent's requests are distributed across nodes. For most threat models, this is acceptable. For strict velocity enforcement, organizations can pin agents to specific nodes using consistent hashing.

The eight dimension scores are combined into a composite risk score using weights defined in the organization's domain pack. Different domains weight risks differently: a FinTech domain pack weights financial impact heavily, while an infrastructure domain pack weights blast radius and reversibility.

Stage 4: Policy Evaluation (8-12ms)

The policy evaluation stage is where the authority decision is made. The evaluation engine loads the applicable policies from an in-memory policy cache and evaluates them against the classified intent, the risk scores, and the agent context.

Policies are compiled into a decision tree when they are created or updated. The decision tree is a binary structure where each node tests a single condition: "is the risk score above 0.7?" or "is the agent in the 'deploy-operators' group?" or "is the current time within the maintenance window?" Leaf nodes contain decisions: allow, deny, or escalate.

The decision tree is compiled once and evaluated many times. Compilation happens when policies are created or updated, which is infrequent. Evaluation is a walk from root to leaf, testing conditions along the way. The depth of the tree is bounded by the number of conditions in the most complex policy. In practice, most organizations have policy trees with fewer than 20 levels, which means fewer than 20 comparisons per evaluation.

When multiple policies apply to the same intent, the evaluation engine uses a precedence model. Deny policies take precedence over allow policies. Escalation policies take precedence over allow policies but not over deny policies. Within the same precedence level, the most specific policy wins: a policy targeting "github.createPullRequest on repo:production" is more specific than a policy targeting "all github operations."

The policy cache is updated via a push mechanism. When an administrator changes a policy in the console or API, the compiled decision tree is pushed to all evaluation nodes within 2 seconds. Between pushes, the cached tree is used. This means policy changes take up to 2 seconds to propagate, but evaluation latency is not affected by policy storage I/O.

Stage 5: Token Issuance (3-5ms)

The final stage produces an Authority Decision Token (ADT). The token is a compact binary structure containing the decision, the risk scores, the policy identifiers that were evaluated, any conditions attached to the decision, and a timestamp.

The token is signed using Ed25519. The signing key is held in memory on the evaluation node, loaded from an HSM-backed key service at startup. Ed25519 signing is fast: approximately 1 microsecond on modern hardware. The signature covers the entire token payload, ensuring that any modification to the token invalidates the signature.

The signed token is returned to the caller as part of the decision response. It is also written to the audit chain asynchronously. The audit write does not block the response. A background writer batches audit entries and writes them to the append-only ledger every 100 milliseconds.

The Numbers

Adding up the stage latencies: 2-3ms intake, 3-5ms classification, 5-8ms risk scoring, 8-12ms policy evaluation, 3-5ms token issuance. The total is 21-33ms for the typical case. The p50 is around 25ms. The p99 is under 50ms, with the tail driven by occasional policy cache updates and garbage collection pauses.

These numbers assume the evaluation node is warm, meaning the policy cache, key hashes, and velocity counters are populated. Cold start latency is higher because the caches need to be filled. In practice, evaluation nodes are long-lived and cold starts are rare.

Why It Matters

Sub-50ms authority decisions mean that governance is invisible to the agent. A typical tool call takes 200-2000ms depending on the downstream service. Adding 25ms of governance overhead is a 1-12% increase in total latency. Most teams cannot measure the difference.

This matters because adoption is the hard part. Security tools that make systems slower do not get used. Teams route around them. The 50ms budget is not an arbitrary performance target. It is the threshold below which governance stops being an obstacle and starts being infrastructure that everyone forgets is there, which is exactly where it should be.