2026-03-04
Governing AI Agent Operations in Kubernetes
Intended Team · Founding Team
Where AI Agents Run
Kubernetes has become the default runtime for AI agent workloads. Agent frameworks deploy as pods. Tool servers run as services. Orchestration layers use the Kubernetes API for scheduling, scaling, and service discovery. When AI agents operate in production, they almost always operate in Kubernetes.
Kubernetes comes with a capable authorization system. RBAC (Role-Based Access Control) defines what service accounts, users, and groups can do within the cluster. ClusterRoles and Roles specify which API verbs (get, list, create, update, delete) are allowed on which resources (pods, deployments, services, configmaps). RBAC is enforced at the API server level, and it works reliably.
But RBAC was designed for human operators and infrastructure automation. It answers the question "can this identity perform this verb on this resource?" It does not answer the questions that matter for AI agent governance: what is the intent behind this operation? What is the risk? Should this action happen right now, given everything else that is happening?
What RBAC Misses
Consider an AI agent that manages deployments in a Kubernetes cluster. The agent has a service account with permissions to create, update, and delete deployments in the production namespace. From an RBAC perspective, the agent is fully authorized.
Now consider these three operations:
- The agent updates a deployment to change an environment variable
- The agent updates a deployment to scale from 3 replicas to 30
- The agent updates a deployment to change the container image to a version that has not been tested
All three operations are "update deployments in the production namespace." RBAC treats them identically. But they have vastly different risk profiles. The first is routine. The second is a significant scaling event that could affect costs and capacity. The third is a potential production incident waiting to happen.
RBAC cannot distinguish between these operations because it operates at the resource-verb level. It does not understand what the update contains, why the agent is making it, or what the operational context is. That is the gap Intended fills.
The Intended Admission Controller
Intended's Kubernetes admission controller is a validating webhook that intercepts API requests before they are persisted. It sits between the API server and the etcd store, evaluating every mutating operation against Intended's authority engine.
When an AI agent submits a Kubernetes API request, the admission controller:
1. Extracts the operation details: which resource, which namespace, what changes 2. Classifies the intent using the MIR taxonomy: is this a deployment, a scaling event, a configuration change, a network update? 3. Submits the classified intent to Intended's authority engine for policy evaluation and risk scoring 4. Returns an admission decision: allow (with authority token), deny (with reason), or hold (pending escalation)
The entire flow adds less than 10ms of latency to the API request, and that latency is configurable -- you can set a timeout that falls back to the cluster's default admission policy if Intended is unreachable.
Installation
The admission controller deploys as a standard Kubernetes deployment with a validating webhook configuration:
# Add the Intended Helm repository
helm repo add meritt https://charts.intended.so
helm repo update
# Install the admission controller
helm install meritt-admission meritt/admission-controller \
--namespace meritt-system \
--create-namespace \
--set apiKey=$Intended_API_KEY \
--set orgId=$Intended_ORG_IDThe Helm chart deploys:
- The admission controller deployment (2 replicas for high availability)
- A validating webhook configuration that intercepts mutating operations
- A service account with minimal RBAC permissions (only what the controller needs)
- A TLS certificate for secure webhook communication
- A ConfigMap with default configuration
After installation, every mutating API request in the cluster passes through the Intended admission controller before being persisted.
Configuration
The admission controller is configurable via the ConfigMap or Helm values:
# values.yaml
config:
# Which namespaces to govern (default: all except kube-system)
namespaces:
include: ["production", "staging", "default"]
exclude: ["kube-system", "kube-public", "meritt-system"]
# Which resource types to evaluate
resources:
- group: "apps"
resources: ["deployments", "statefulsets", "daemonsets"]
- group: ""
resources: ["services", "configmaps", "secrets"]
- group: "networking.k8s.io"
resources: ["ingresses", "networkpolicies"]
# Fail mode: closed (deny on error) or open (allow on error)
failMode: "closed"
# Evaluation timeout in milliseconds
evaluationTimeoutMs: 5000
# Cache approved decisions for repeated operations
cacheSeconds: 30
# Skip evaluation for operations by these service accounts
bypassServiceAccounts:
- "system:serviceaccount:kube-system:kube-controller-manager"The namespace and resource filters let you control the scope of governance. Start with production namespaces and critical resource types, then expand as confidence grows.
What It Catches That RBAC Misses
Risky scaling events
An AI agent attempts to scale a production deployment from 5 to 100 replicas. RBAC allows it -- the agent has update permissions on deployments. The Intended admission controller classifies this as a MIR-300 (Infrastructure) scale-up operation, scores the risk based on the magnitude of the change (20x increase), the target namespace (production), and the time of day (business hours). The risk score exceeds the auto-approve threshold, so the operation is escalated to the platform team for review.
Untested image deployments
An AI agent updates a deployment to use a container image that was built from an unmerged branch. RBAC allows it -- the agent has update permissions. The Intended admission controller detects that the image tag does not match any tagged release in the approved registry. The MIR-100 (Software Development) domain pack flags deployments with unapproved images as high-risk. The operation is denied with a clear explanation: "Image tag ai/experiment-branch is not in the approved release registry."
Configuration drift
An AI agent modifies a ConfigMap in the production namespace, changing a database connection string. RBAC allows it -- the agent has update permissions on configmaps. The Intended admission controller classifies this as a MIR-300 (Infrastructure) config-update, notes that the change affects a database connection (high sensitivity), and evaluates it against the policy that requires all production database configuration changes to be reviewed. The operation is escalated.
Bulk operations
An AI agent submits 50 delete requests for pods in rapid succession. Each individual delete is permitted by RBAC. But the Intended admission controller tracks velocity -- it detects that 50 delete operations in 60 seconds is anomalous for this agent. The velocity signal raises the risk score, and after the 10th delete, subsequent operations are escalated until a human confirms the bulk operation is intentional.
Cross-namespace access patterns
An AI agent that normally operates in the staging namespace submits a create operation in the production namespace. RBAC allows it -- the agent's ClusterRole grants access to both namespaces. The Intended admission controller notes the namespace change as an anomaly. The agent's historical pattern shows 99% of operations in staging. A production operation from a staging-focused agent triggers a risk factor, and the operation is evaluated with elevated scrutiny.
Observability
The admission controller exposes Prometheus metrics for monitoring:
meritt_admission_decisions_total{decision="allow|deny|escalate",namespace="...",resource="..."}
meritt_admission_latency_seconds{quantile="0.5|0.9|0.99"}
meritt_admission_errors_total{type="timeout|classification|evaluation"}
meritt_admission_cache_hits_total
meritt_admission_cache_misses_totalThese metrics integrate with your existing monitoring stack. Set alerts on denial rates, escalation rates, and latency to detect governance issues early.
The admission controller also logs structured events for every decision, compatible with any log aggregation system (Elasticsearch, Splunk, Datadog, CloudWatch).
Coexistence with Existing Admission Controllers
Intended's admission controller is a validating webhook, not a mutating webhook. It does not modify API requests -- it only allows or denies them. This means it coexists cleanly with existing admission controllers like OPA/Gatekeeper, Kyverno, or custom webhooks.
The recommended ordering is: mutating webhooks first (to modify requests as needed), then Intended's validating webhook (to evaluate the final request against authority policies), then any other validating webhooks. This ensures Intended evaluates the request as it will actually be persisted.
Getting Started
Install the admission controller, configure your namespaces and resource types, and start governing AI agent operations in your Kubernetes cluster. Every mutating operation is now classified, risk-scored, policy-evaluated, and cryptographically recorded.
Start with the free tier. The first 5,000 admission decisions per month are free. Install the Helm chart and see what your AI agents are actually doing in your cluster.