Join our Discord Server
Collabnix Team The Collabnix Team is a diverse collective of Docker, Kubernetes, and IoT experts united by a passion for cloud-native technologies. With backgrounds spanning across DevOps, platform engineering, cloud architecture, and container orchestration, our contributors bring together decades of combined experience from various industries and technical domains.

Agents Are the New Pods: Running AI Agents on Kubernetes with the Sandbox CRD

5 min read

Exploring AI Agents on Kubernetes for Enhanced Performance

I’ve been spending a lot of time lately watching people try to deploy AI agents the same way they deploy microservices. They reach for a Deployment, slap on an HPA, maybe wire up a Service, and then wonder why everything feels slightly off. The agent crashes mid-task and loses its scratchpad. The pod recycles and the conversation forgets what it was doing. Two agents end up on the same hostname and the orchestrator gets confused. The team’s first instinct is to add more YAML.

The honest answer is that agents are not microservices. They look like microservices from a distance they listen on ports, they call APIs, they get scheduled onto nodes but the operational shape is different. An agent is a long-lived workspace with state, identity, and a tendency to execute code it just wrote a second ago. That last part should make any platform engineer nervous.

Earlier this year, Kubernetes SIG Apps published a blog announcing a project called Agent Sandbox, which introduces a new Sandbox CRD specifically for agent workloads. I want to walk through what it actually does, why it matters, and how it fits into the broader shift from “agents as a demo” to “agents as a deployment pattern.”

Why your Deployment is fighting you

Take a normal Kubernetes Deployment. It assumes your workload is stateless, fungible, and roughly always-on. Three replicas behind a Service, traffic distributed via round-robin, pod dies and a new one takes its plac, the system doesn’t care which one you talked to.

Now think about an agent. A user kicks off a task at 2pm. The agent reads some files, runs some Python, writes intermediate results to disk, calls an LLM, runs more Python. The user steps away for lunch. Comes back at 3pm and asks “what was the variance again?” The agent needs to be the same agent same memory, same files, same in-progress reasoning. If a Deployment controller decided to recycle that pod, you’ve just lost an hour of context.

StatefulSets get closer, but they’re built for ordered, replicated workloads like databases. They don’t know how to scale an idle agent to zero. They don’t have a clean answer for “this agent might run untrusted code so we need real kernel isolation.” And they don’t address the most expensive problem in agent infrastructure: cold starts.

What the Sandbox CRD actually gives you

The Agent Sandbox project is currently in development under SIG Apps and introduces a declarative API tailored for singleton, stateful workloads. The core resource is the Sandbox ~ a lightweight, single-container environment built on Kubernetes primitives. Four properties matter:

  • Strong isolation for untrusted code. The Sandbox natively supports gVisor or Kata Containers as the runtime. If your agent is going to exec something the LLM wrote, you really, really want a kernel boundary between that process and the host.
  • Lifecycle that matches reality. Agents are bursty. They sit idle for hours, then do five minutes of intense work, then go idle again. Sandbox supports scaling idle environments to zero while preserving state, so you’re not paying for a fleet of dormant pods.
  • Stable identity. Every Sandbox gets a stable hostname and network identity. When you have a researcher agent, a coder agent, and an analyst agent talking to each other, this is non-negotiable.
  • Warm pools for sub-second resumption. The SandboxWarmPool resource keeps a pool of pre-provisioned Sandbox pods ready to go. Without it, the ~1 second of pod startup overhead breaks the conversational feel of an agent that was just resumed.

That last one is the part most people skip over and then regret. A second of cold start is invisible for a microservice. For an agent that was idle and just got pinged, it’s the difference between “still thinking” and “did this break?”

A working example

Here’s roughly what a Sandbox manifest looks like for an agent that runs Python, has access to a workspace volume, and uses gVisor for isolation. The exact field names will evolve while the project is in alpha ~ check the upstream repo for the current schema before you copy this into production but the shape gives you the right mental model.

apiVersion: agents.x-k8s.io/v1alpha1
kind: Sandbox
metadata:
  name: research-agent-ajeet
  namespace: agents
spec:
  runtimeClassName: gvisor
  template:
    spec:
      containers:
        - name: agent
          image: ghcr.io/collabnix/research-agent:0.4.2
          env:
            - name: AGENT_ID
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: MODEL_ENDPOINT
              value: "http://model-runner.models.svc:8080/v1"
          resources:
            requests:
              cpu: "500m"
              memory: "1Gi"
            limits:
              cpu: "2"
              memory: "4Gi"
      volumes:
        - name: workspace
          emptyDir: {}
  idleTimeout: 30m
  resumePolicy: OnDemand

A few things worth noticing. The runtimeClassName: gvisor line is doing a lot of work — it’s the difference between “the agent ran some Python” and “the agent ran some Python and the host kernel is fine.” The idleTimeout and resumePolicy together give you the scale-to-zero behavior. And because the Sandbox has a stable hostname, another agent can reach this one at research-agent-ajeet.agents.svc.cluster.local without any service mesh gymnastics.

For warm pools, you’d pair this with a template and a claim:

apiVersion: agents.x-k8s.io/v1alpha1
kind: SandboxTemplate
metadata:
  name: python-agent-template
spec:
  template:
    spec:
      runtimeClassName: gvisor
      containers:
        - name: agent
          image: ghcr.io/collabnix/python-agent-base:0.4.2
---
apiVersion: agents.x-k8s.io/v1alpha1
kind: SandboxWarmPool
metadata:
  name: python-pool
spec:
  templateRef:
    name: python-agent-template
  minReady: 5
  maxReady: 20
---
# When an orchestrator needs an agent, it issues a claim:
apiVersion: agents.x-k8s.io/v1alpha1
kind: SandboxClaim
metadata:
  name: claim-task-7842
spec:
  templateRef:
    name: python-agent-template

The orchestrator hands a SandboxClaim to the controller, the controller pulls a pre-warmed pod from the pool, and the agent is ready in milliseconds instead of seconds. This is the same pattern teams like Lovable describe using on GKE Agent Sandbox to scale to hundreds of secure sandboxes per second ~ the warm pool is what makes that throughput possible.

How this connects to the rest of the stack

If you’re building agentic systems on Kubernetes today, the Sandbox CRD is one piece. The other pieces are converging fast:

  • MCP (Model Context Protocol) handles how an agent connects to tools and data sources. Google adopted MCP across its services in late 2025, with managed remote MCP servers now available for things like Compute Engine and Kubernetes Engine.
  • A2A (Agent2Agent) handles how agents talk to each other across platform boundaries. It’s now governed by the Linux Foundation’s Agentic AI Foundation and is in production at 150+ organizations.
  • Inference infrastructure — llm-d (now a CNCF Sandbox project), Gateway API Inference Extension, and tools like Docker Model Runner ~ handles serving the actual models.

Sandbox is the missing piece for the execution surface. MCP tells the agent what tools it has. A2A tells it who else it can talk to. The model layer gives it a brain. Sandbox is where it actually lives and runs.

The thing nobody talks about: blast radius

Here’s the part I want to underline because it doesn’t get enough airtime in agent demos.

An agent generates code. The code runs. If it goes wrong whether through a hallucinated rm -rf, a prompt injection that turned the agent against its own user, or just a genuinely buggy plan — you want that wrongness contained. A regular container shares a kernel with everything else on the node. That’s a paper-thin boundary for code an LLM produced ten seconds ago.

This is why gVisor and Kata Containers matter so much in this conversation. gVisor intercepts syscalls in userspace and gives you a real isolation boundary. Kata gives you a lightweight VM per workload. Either way, when the agent does something unexpected, the damage stops at the sandbox edge instead of spreading to the node.

If you’ve been following the agent infrastructure space — Docker’s sbx, GKE Agent Sandbox, Cloudflare’s sandboxing work, you’ll notice everyone is converging on the same answer: microVM or userspace-kernel isolation as the default boundary for agent execution. The Sandbox CRD makes that boundary a first-class declarative concept in Kubernetes.

Should you adopt this today?

Honest answer: not in production, not yet. The project is early and the API will move. But if you’re building anything agentic on Kubernetes even just experimenting ~ I’d strongly recommend reading the SIG Apps design doc and starting to think in these primitives, because this is where the platform is heading.

A few practical things you can do this week:

  1. Audit how your current agents are deployed. If they’re plain Deployments with shared filesystems and no runtime isolation, that’s a real risk worth flagging.
  2. Try gVisor as a RuntimeClass for any workload that executes generated code today, even before Sandbox is GA. You don’t need the CRD to get the isolation benefit.
  3. Start separating concerns: tooling layer (MCP), communication layer (A2A), execution layer (Sandbox or equivalent), model layer. Even if you build it on existing primitives now, the mental model will save you when the proper APIs land.

The shift from “agents in a Jupyter notebook” to “agents as production infrastructure” is happening faster than the microservices shift did. Kubernetes is absorbing it the way it absorbed every previous workload pattern — by growing the right primitive at the right time. The Sandbox CRD is that primitive for this generation of workload.

If you’re working on this in your own clusters, I’d genuinely love to hear what’s working and what isn’t. Hit me up in the Collabnix Slack ~ the #kubernetes and #ai-agents channels have been busy with exactly these conversations.

References

Have Queries? Join https://launchpass.com/collabnix

Collabnix Team The Collabnix Team is a diverse collective of Docker, Kubernetes, and IoT experts united by a passion for cloud-native technologies. With backgrounds spanning across DevOps, platform engineering, cloud architecture, and container orchestration, our contributors bring together decades of combined experience from various industries and technical domains.

Top 10 Real-World Use Cases for OpenClaw AI Agents…

Explore how OpenClaw AI agents are poised to revolutionize industries in 2025 with groundbreaking use cases and adaptable open-source capabilities.
Collabnix Team
9 min read

Building a RAG-Powered Agent with OpenClaw: Step-by-Step Tutorial

Learn how to build a powerful RAG-powered agent using the innovative OpenClaw framework. This comprehensive tutorial guides you through setting up a retrieval and...
Collabnix Team
3 min read

Integrating OpenClaw with Local LLMs Using Ollama and LM…

Learn how to effectively integrate OpenClaw with local LLMs like Ollama and LM Studio to build intelligent, efficient AI agent systems.
Collabnix Team
7 min read
Join our Discord Server
Index