LLM Gateway Integration

Govern every agent tool call by pointing your runtime's model base URL at the Intended LLM gateway. One env var; Anthropic, OpenAI, NVIDIA NIM, AWS Bedrock, Google Vertex; fail-closed by default.

beginner6 min readimplemented

LLM Gateway Integration#

The Intended LLM gateway is a streaming proxy that sits between your agent runtime and the model provider. It buffers each tool call the model emits, evaluates it against your Intended policies, and either forwards it unchanged (APPROVED) or rewrites it into a text block the agent reads as a normal model response (DENIED / ESCALATED). Non-tool tokens stream through with zero added latency.

The integration is one config line. No SDK pasted into agents, no code changes — you point the model API base URL at the gateway.

Warning

The gateway is fail-closed. If the authority engine is unreachable or returns an unknown shape, the tool call is denied (rewritten), not forwarded. The whole value is that nothing slips past while the system is degraded.

Quickstart#

bash

# 1. Mint a tenant key (intended_live_…) from the console.
export INTENDED_KEY=intended_live_…
export INTENDED_TENANT_ID=<your-tenant-id>

# 2. Point your runtime's model base URL at the gateway, by provider.
export ANTHROPIC_BASE_URL=https://gateway.intended.so/v1/anthropic
# OpenAI / OpenAI-compatible runtimes:
export OPENAI_BASE_URL=https://gateway.intended.so/v1/openai
# NVIDIA NIM:
export OPENAI_BASE_URL=https://gateway.intended.so/v1/nvidia-nim

Restart the runtime. Every tool call flowing through the model is now evaluated. The gateway identifies your tenant from X-Intended-Key (set it alongside the base URL via your runtime's header config, or use the runtime helper below).

What gets evaluated#

For each tool call the model produces, the interceptor:

Buffer the tool block

The streaming tool_use (Anthropic) / tool_calls (OpenAI) frames are buffered per block until the full tool name and arguments are available. Anthropic's inline blocks and OpenAI's array form normalize to the same canonical event sequence.

Evaluate against authority

The gateway derives a baseline risk profile from the tool name (read-only tools score low; physical actuation, destructive, and financial tools score high and default toward escalation) and submits a structured intent to the authority engine.

Apply the directive

APPROVED → forward the buffered frames unchanged. DENIED → replace with a deny text block carrying the rationale. ESCALATED → replace with a pending-approval marker; a human approves from the console queue.

Operators can short-circuit the round-trip with per-tool / per-actor overrides (auto_approve, always_deny, always_escalate) so deterministic tools never pay the evaluation cost. First match wins; match is a case-insensitive regex against the tool name or actor id.

json

{
  "gateway": {
    "policyOverrides": [
      { "scope": "tool",  "match": "^get_",          "directive": "auto_approve" },
      { "scope": "tool",  "match": "^wire_transfer", "directive": "always_deny" },
      { "scope": "actor", "match": "^gateway:",      "directive": "always_escalate" }
    ]
  }
}

Provider coverage#

Provider	Path prefix	Streaming	Provider id
Anthropic	`/v1/anthropic/*`	SSE	`anthropic`
OpenAI	`/v1/openai/*`	SSE	`openai`
NVIDIA NIM	`/v1/nvidia-nim/*`	SSE	`nvidia-nim`
AWS Bedrock (Claude)	`/v1/bedrock-anthropic/*`	AWS event-stream	`bedrock-anthropic`
Google Vertex (Gemini)	`/v1/vertex-gemini/*`	NDJSON	`vertex-gemini`

The path after the provider prefix is forwarded verbatim to the upstream — the gateway does not re-map each provider's URL space.

Observe vs enforce#

The gateway runs in one of two modes:

observe — every tool call is evaluated and recorded, but the response is forwarded unchanged. Use this to see what would be blocked before turning on enforcement.
enforce — rewrite directives are applied. This is the production default.

The effective mode is set per tenant in configuration (the source of truth). When no tenant config is present, the gateway fails closed to enforce.

The X-Intended-Mode header is a one-directional ratchet: it can only make enforcement stricter, never weaker. A tenant on observe can raise a single request to enforce (X-Intended-Mode: enforce) for staged rollouts — but a tenant on enforce can never be downgraded. X-Intended-Mode: observe sent to an enforcing tenant is ignored and the request still enforces.

Headers#

Header from your runtime	Used for	Forwarded upstream?
`Authorization: Bearer <model-key>`	model auth	yes, untouched
`x-api-key: <model-key>`	model auth (Anthropic)	yes, untouched
`anthropic-version` / `openai-beta`	provider versioning	yes, untouched
`X-Intended-Key` (or `X-Intended-Tenant-Key`)	tenant identification	terminated at gateway
`X-Intended-Tenant-Id`	tenant scoping	terminated at gateway
`X-Intended-Mode`	may only raise enforcement (ratchet); can never weaken it	terminated at gateway

You keep the model API key. The gateway forwards Authorization / x-api-key upstream untouched — it is never stored. If X-Intended-Key is missing, the gateway refuses the request and tells you to set it.

Failure modes#

Condition	Gateway behavior
Missing `X-Intended-Key`	Request refused with an instruction to set the header.
Unknown provider in the path	`404 unknown_provider`.
Authority engine unreachable / errors	Fail closed — the tool call is rewritten (denied), not forwarded.
Unknown decision shape from the engine	Fail closed — treated as a deny.
Upstream provider stream cuts off mid-tool-block	Buffered frames are flushed so the agent's parser sees the partial state.

Runtime helper (OpenClaw / Node harnesses)#

For OpenClaw and Node-based agent harnesses, @intended/openclaw-plugin exposes installIntendedGateway({ tenantId, intendedKey }). Call it once at startup; it sets the per-provider base URLs and returns a ConfiguredGateway with URL accessors and a fetch wrapper.

import { installIntendedGateway } from "@intended/openclaw-plugin";

const gateway = installIntendedGateway({
  tenantId: process.env.INTENDED_TENANT_ID!,
  intendedKey: process.env.INTENDED_KEY!, // intended_live_…
});

For physical-AI runtimes (Isaac Sim, ROS2), the gateway governs the LLM-planning layer. Gate the actuation itself with an on-robot Authority Token verifier — see Verify Decision Tokens.

Observability#

The gateway exposes /healthz, /readyz, and /metrics (Prometheus). Key series:

intended_gateway_requests_total{provider, tenant, decision}
intended_gateway_tool_calls_total{provider, tenant, decision, tool_category}
intended_gateway_upstream_latency_ms{provider, tenant}
intended_gateway_decision_latency_ms{provider, tenant}
intended_gateway_upstream_errors_total{provider, tenant, status}

W3C traceparent is echoed so upstream and downstream log lines correlate, and each response carries an x-intended-gateway-request-id.

Self-hosting#

The gateway is source-available in packages/llm-gateway and runs as the intended-gateway CLI binary, or embedded via the programmatic buildServer(...) API. Run it in your VPC with the same env vars — read exactly what it does, or host it where your traffic already lives.

Next steps#

Runtime Integrations — direct vs adapter vs gateway.
Verify Decision Tokens — gate physical actuation on a fresh token.
OpenShell & NemoClaw — compile sandbox policy for agent runtimes. </content>