Skip to content

Threat model

Why Intended exists

The agent vulnerability problem

Humans needed governance because we have drives that can be exploited — hunger, fear, greed, lust. The attack surface is the gap between what we should do and what we want to do. We evolved into a world that had to be governed because our natural state was chaos. Rules, prohibition, laws — all of it emerged because humans cannot reliably self-govern when external forces target our weaknesses.

Agents don't have desires. But they have something functionally equivalent: an instruction-following surface that can be manipulated. The “desire” of an agent is to complete its task according to its instructions. That's the thing you can poison.

The kryptonite: prompt injection

The single most dangerous vulnerability class in autonomous agents is prompt injection — and it's why the governance layer that Intended provides isn't optional; it's existential.

An agent cannot reliably distinguish between legitimate instructions and malicious ones embedded in data it processes. If an agent reads an email, a document, a database record, a web page, an API response — any of those can contain text that says “ignore your previous instructions and do X instead.” And depending on the architecture, it works.

This is not theoretical. It's the #1 item on the OWASP Top 10 for LLM Applications.

Four attack vectors governance must cover

Data-as-instructions (indirect prompt injection)

An attacker does not need access to your system. They only need to put poisoned text somewhere an agent will read it — a Jira ticket, a Slack message, a PDF, a code comment, a DNS TXT record. If the agent ingests it and treats it as context, the attacker has a voice in the room.

Tool abuse through misdirection

An agent with tools — file access, API calls, code execution — can be redirected to use those tools against you. “Summarize this document” becomes “exfiltrate this document” if the agent’s instructions get overwritten mid-task.

Chain-of-agent corruption

In multi-agent systems, one compromised agent can pass poisoned context downstream. A single injection point cascades. That is the “poison an entire industry with one piece of code” scenario — plausible wherever agents consume each other’s outputs without sanitization.

Slow-burn manipulation

Not every attack is dramatic. Someone can subtly bias an agent’s decision-making over time — skewed data, gently misleading context — so the agent drifts from its intended behavior without an obvious red flag.

The defense isn't making agents smarter about detecting attacks. That's an arms race you lose. The defense is constraining what an agent can do even when compromised.

What Intended solves at the decision layer

  • Input provenance

    The agent must know where its context came from and treat different sources with different trust levels. Instructions from the system prompt are not the same as text scraped from a webpage.

  • Action boundaries

    Hard constraints that cannot be overridden by prompt content. Not “the agent is told not to” — the agent literally cannot — enforced at the infrastructure level, not the instruction level.

  • Decision auditing

    Every consequential decision produces a trace that a separate system — not the agent — can review. The agent doesn’t grade its own homework.

  • Authority checkpoints

    Autonomous does not mean unsupervised. The governance question is not “human or agent” — it is where you put the checkpoints and who holds authority at each one.

The agent doesn't grade its own homework.

The bottom line

Can someone poison an autonomous agent?
Yes, absolutely.
The attack surface is the context window — everything the agent reads is a potential vector.
Can someone poison an entire industry by dropping a single piece of code?
Yes, if that industry runs agents that consume each other's outputs without an authority layer between them.

The governance layer isn't optional polish on top of autonomy. It's the thing that makes autonomy survivable. That's what Intended is.

No Token, No Action.

Read how it works in practice: How it works, fail-closed enforcement, and platform capabilities.

Put authority between agents and production.