Skip to content

guides

Intended Documentation

LLM Gateway Integration

Govern every agent tool call by pointing your runtime's model base URL at the Intended LLM gateway. One env var; Anthropic, OpenAI, NVIDIA NIM, AWS Bedrock, Google Vertex; fail-closed by default.

LLM Gateway Integration#

The Intended LLM gateway is a streaming proxy that sits between your agent runtime and the model provider. It buffers each tool call the model emits, evaluates it against your Intended policies, and either forwards it unchanged (APPROVED) or rewrites it into a text block the agent reads as a normal model response (DENIED / ESCALATED). Non-tool tokens stream through with zero added latency.

The integration is one config line. No SDK pasted into agents, no code changes — you point the model API base URL at the gateway.

Warning

The gateway is fail-closed. If the authority engine is unreachable or returns an unknown shape, the tool call is denied (rewritten), not forwarded. The whole value is that nothing slips past while the system is degraded.

Quickstart#

bash
# 1. Mint a tenant key (mrt_…) from the console.
export INTENDED_KEY=mrt_live_…
export INTENDED_TENANT_ID=<your-tenant-id>

# 2. Point your runtime's model base URL at the gateway, by provider.
export ANTHROPIC_BASE_URL=https://gateway.intended.so/v1/anthropic
# OpenAI / OpenAI-compatible runtimes:
export OPENAI_BASE_URL=https://gateway.intended.so/v1/openai
# NVIDIA NIM:
export OPENAI_BASE_URL=https://gateway.intended.so/v1/nvidia-nim

Restart the runtime. Every tool call flowing through the model is now evaluated. The gateway identifies your tenant from X-Intended-Key (set it alongside the base URL via your runtime's header config, or use the runtime helper below).

What gets evaluated#

For each tool call the model produces, the interceptor:

Buffer the tool block

The streaming tool_use (Anthropic) / tool_calls (OpenAI) frames are buffered per block until the full tool name and arguments are available. Anthropic's inline blocks and OpenAI's array form normalize to the same canonical event sequence.

Evaluate against authority

The gateway derives a baseline risk profile from the tool name (read-only tools score low; physical actuation, destructive, and financial tools score high and default toward escalation) and submits a structured intent to the authority engine.

Apply the directive

APPROVED → forward the buffered frames unchanged. DENIED → replace with a deny text block carrying the rationale. ESCALATED → replace with a pending-approval marker; a human approves from the console queue.

Operators can short-circuit the round-trip with per-tool / per-actor overrides (auto_approve, always_deny, always_escalate) so deterministic tools never pay the evaluation cost. First match wins; match is a case-insensitive regex against the tool name or actor id.

json
{
  "gateway": {
    "policyOverrides": [
      { "scope": "tool",  "match": "^get_",          "directive": "auto_approve" },
      { "scope": "tool",  "match": "^wire_transfer", "directive": "always_deny" },
      { "scope": "actor", "match": "^gateway:",      "directive": "always_escalate" }
    ]
  }
}

Provider coverage#

ProviderPath prefixStreamingProvider id
Anthropic/v1/anthropic/*SSEanthropic
OpenAI/v1/openai/*SSEopenai
NVIDIA NIM/v1/nvidia-nim/*SSEnvidia-nim
AWS Bedrock (Claude)/v1/bedrock-anthropic/*AWS event-streambedrock-anthropic
Google Vertex (Gemini)/v1/vertex-gemini/*NDJSONvertex-gemini

The path after the provider prefix is forwarded verbatim to the upstream — the gateway does not re-map each provider's URL space.

Observe vs enforce#

The gateway runs in one of two modes:

  • observe — every tool call is evaluated and recorded, but the response is forwarded unchanged. Use this to see what would be blocked before turning on enforcement.
  • enforce — rewrite directives are applied. This is the production default.

Set mode per request with the X-Intended-Mode header (for staged rollouts), or per tenant in configuration.

Headers#

Header from your runtimeUsed forForwarded upstream?
Authorization: Bearer <model-key>model authyes, untouched
x-api-key: <model-key>model auth (Anthropic)yes, untouched
anthropic-version / openai-betaprovider versioningyes, untouched
X-Intended-Key (or X-Intended-Tenant-Key)tenant identificationterminated at gateway
X-Intended-Tenant-Idtenant scopingterminated at gateway
X-Intended-Modeper-request observe / enforceterminated at gateway

You keep the model API key. The gateway forwards Authorization / x-api-key upstream untouched — it is never stored. If X-Intended-Key is missing, the gateway refuses the request and tells you to set it.

Failure modes#

ConditionGateway behavior
Missing X-Intended-KeyRequest refused with an instruction to set the header.
Unknown provider in the path404 unknown_provider.
Authority engine unreachable / errorsFail closed — the tool call is rewritten (denied), not forwarded.
Unknown decision shape from the engineFail closed — treated as a deny.
Upstream provider stream cuts off mid-tool-blockBuffered frames are flushed so the agent's parser sees the partial state.

Runtime helper (OpenClaw / Node harnesses)#

For OpenClaw and Node-based agent harnesses, @intended/openclaw-plugin exposes installIntendedGateway({ tenantId, intendedKey }). Call it once at startup; it sets the per-provider base URLs and returns a ConfiguredGateway with URL accessors and a fetch wrapper.

ts
import { installIntendedGateway } from "@intended/openclaw-plugin";

const gateway = installIntendedGateway({
  tenantId: process.env.INTENDED_TENANT_ID!,
  intendedKey: process.env.INTENDED_KEY!, // mrt_…
});

For physical-AI runtimes (Isaac Sim, ROS2), the gateway governs the LLM-planning layer. Gate the actuation itself with an on-robot Authority Token verifier — see Verify Decision Tokens.

Observability#

The gateway exposes /healthz, /readyz, and /metrics (Prometheus). Key series:

  • intended_gateway_requests_total{provider, tenant, decision}
  • intended_gateway_tool_calls_total{provider, tenant, decision, tool_category}
  • intended_gateway_upstream_latency_ms{provider, tenant}
  • intended_gateway_decision_latency_ms{provider, tenant}
  • intended_gateway_upstream_errors_total{provider, tenant, status}

W3C traceparent is echoed so upstream and downstream log lines correlate, and each response carries an x-intended-gateway-request-id.

Self-hosting#

The gateway is source-available in packages/llm-gateway and runs as the intended-gateway CLI binary, or embedded via the programmatic buildServer(...) API. Run it in your VPC with the same env vars — read exactly what it does, or host it where your traffic already lives.

Next steps#

LLM Gateway Integration | Intended