Ollama just dropped one of its most significant updates yet – native support for subagents and web search when using Claude Code. The best part? Zero configuration overhead. No MCP servers to set up, no API keys to manage, no Docker Compose files to orchestrate. Just one command, and you’re running parallel AI agents with real-time web access.
If you’ve been following the agentic AI space, you know that getting multiple AI agents to work together typically involves a fair amount of plumbing. Ollama just eliminated most of that friction.
Let’s break down what’s new, how to get started, and why this matters for developers building agentic workflows.
What Changed?
Ollama has been steadily evolving from a simple local model runner into a full-fledged AI development platform. The recent releases (v0.16.x) introduced two game-changing capabilities:
- Subagents in Claude Code — Models can now spawn parallel agents, each running in their own context, to tackle tasks like file search, code exploration, and research simultaneously.
- Built-in Web Search — When a model needs current information, Ollama handles the search and returns results directly. No external search APIs, no Brave Search keys, no MCP server configuration.
Both features work with Ollama’s cloud models out of the box using the ollama launch command.
Getting Started
Getting up and running takes a single command:
ollama launch claude --model minimax-m2.5:cloud
That’s it. This command launches Claude Code with MiniMax M2.5 running on Ollama’s cloud, with subagent and web search capabilities enabled automatically.
You can swap in any of the recommended cloud models:
# MiniMax M2.5 — strong agentic and coding performance
ollama launch claude --model minimax-m2.5:cloud
# GLM-5 — 744B total parameters (40B active), built for complex systems engineering
ollama launch claude --model glm-5:cloud
# Kimi K2.5 — excellent for research and reasoning tasks
ollama launch claude --model kimi-k2.5:cloud
All three models natively trigger subagents when the task calls for it. No special prompting required — although you can force it if needed.
Subagents: Parallel AI Workers in Your Terminal
Subagents are the headline feature here. When you give Claude Code a complex task, it can now spin up multiple subagents that work in parallel, each operating in their own isolated context.
Think of it like spawning multiple Docker containers, each handling a different piece of work, except these are AI agents running tasks concurrently inside Claude Code.
How Subagents Work
When you issue a prompt, the main agent evaluates the task and decides whether to spawn subagents. Each subagent gets its own context window and can independently search files, explore code, and perform research. Results are collected and synthesized by the main agent.
Some models — specifically MiniMax M2.5, GLM-5, and Kimi K2.5 — will naturally trigger subagents when the task warrants it. For other models, you can explicitly request subagent usage in your prompt.
Example Prompts
Here are some practical prompts that demonstrate the power of parallel subagents:
Codebase Exploration:
spawn subagents to explore the auth flow, payment integration, and notification system
This creates three parallel agents, each diving into a different subsystem of your codebase simultaneously, rather than sequentially walking through each one.
Architecture Mapping:
create subagents to map the database queries, trace the API routes, and catalog error handling patterns
Three agents work concurrently: one maps all database interactions, another traces API routing logic, and the third catalogs error handling across your codebase.
Migration Planning:
research the postgres 18 release notes, audit our queries for deprecated patterns, and create migration tasks
This combines web search with code analysis — one agent researches PostgreSQL 18 changes online, another audits your existing queries, and a third generates migration tasks based on both findings.
Web Search: Real-Time Knowledge Without Configuration
The web search integration is equally impressive in its simplicity. When a model running on Ollama’s cloud needs current information, it can search the web automatically. There’s no need to configure a search MCP server, set up a Brave Search API key, or write any middleware.
This is particularly powerful when combined with subagents. You can spawn multiple research agents that each search for different topics in parallel and return actionable results.
Example: Competitive Analysis
create 3 research agents to research how our top 3 competitors price their API tiers,
compare against our current pricing, and draft recommendations
This single prompt creates three parallel workstreams: each agent researches a competitor’s pricing via web search, compares findings against your current pricing structure, and collaboratively produces pricing recommendations. All of this happens concurrently.
How Web Search Works Under the Hood
Ollama introduced a dedicated Web Search API that provides search and page-fetch capabilities. When using ollama launch claude with cloud models, this search capability is wired in automatically. The model decides when to search based on the task context — if it needs information beyond its training data cutoff, it searches.
Ollama provides a generous free tier of web searches for individual users, with higher rate limits available through Ollama’s cloud subscription.
Recommended Models
Not all models are created equal when it comes to agentic capabilities. Here’s a quick comparison of the recommended models for subagent and web search workflows:
| Model | Parameters | Best For | Subagent Triggering |
|---|---|---|---|
| minimax-m2.5:cloud | 230B total, 10B active | Coding, agentic workflows | Native |
| glm-5:cloud | 744B total, 40B active | Complex systems engineering, long-horizon tasks | Native |
| kimi-k2.5:cloud | — | Research, reasoning | Native |
All three models support native subagent triggering, meaning they’ll automatically spawn subagents when the task benefits from parallelism. For models that don’t natively trigger subagents, you can explicitly prompt with “use subagents,” “spawn subagents,” or “create subagents.”
How This Compares to Docker MCP Gateway and cagent
If you’re already using Docker’s MCP Gateway, cagent, or Docker Compose for Agents, you might be wondering where Ollama’s approach fits in. Here’s how I think about it:
Ollama’s subagents are great for:
- Quick prototyping and exploration
- Individual developer workflows
- Tasks where zero configuration matters
- Research and competitive intelligence
- Codebase exploration and auditing
Docker’s agent ecosystem is better for:
- Production-grade multi-agent orchestration
- Custom tool integration via MCP servers
- Complex agent-to-agent communication patterns
- Enterprise deployments with security requirements
- Persistent agent workflows with state management
Think of Ollama as the “quick start” — perfect for getting agentic workflows running in seconds. Docker’s stack is the “production path” — designed for when those workflows need to scale, integrate with enterprise systems, and run reliably in production.
The two approaches are complementary, not competing. You might prototype with Ollama’s subagents, then graduate to Docker Compose for Agents when you need production-grade orchestration.
Setting Up for Local Models
While subagents and web search work best with cloud models, you can also use Ollama with local models for Claude Code. The setup requires a few environment variables:
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_API_KEY=""
export ANTHROPIC_BASE_URL=http://localhost:11434
claude --model glm-4.7-flash:latest
Note that local models won’t have access to Ollama’s web search functionality, and subagent support may vary depending on the model’s tool-calling capabilities. For the full subagent + web search experience, cloud models are the way to go.
For local inference, ensure your model has at least 64K tokens of context length. You can set this using a Modelfile:
cat > Modelfile << 'EOF'
FROM qwen3-coder
PARAMETER num_ctx 65536
EOF
ollama create qwen3-coder-64k -f Modelfile
Privacy Controls
Ollama also introduced an OLLAMA_NO_CLOUD setting for users who need to ensure data never leaves their machine:
# Linux / manual ollama serve
export OLLAMA_NO_CLOUD=1
# macOS / Windows — toggle via the Ollama app settings
This is especially relevant for enterprises working with sensitive codebases. You can use local models with full privacy guarantees, then switch to cloud models when you need the extra horsepower and web search capabilities.
Practical Use Cases
Here are some real-world scenarios where Ollama’s subagents and web search combination shines:
1. Onboarding to a New Codebase
spawn subagents to:
1. map the project structure and key entry points
2. identify the tech stack and dependencies
3. find the testing patterns and CI/CD configuration
4. locate the documentation and API specs
2. Security Audit
create subagents to audit this codebase for:
- SQL injection vulnerabilities
- hardcoded secrets and credentials
- outdated dependencies with known CVEs (search the web for latest advisories)
- authentication and authorization patterns
3. Migration Planning
research the latest Next.js 15 migration guide,
audit our current Next.js 13 codebase for breaking changes,
and create a prioritized migration plan with estimated effort
4. Documentation Generation
spawn subagents to explore each module in src/,
document the public APIs, and create a comprehensive README
with architecture diagrams in Mermaid format
What This Means for the Agentic AI Ecosystem
The broader trend here is clear: the barrier to running multi-agent workflows is dropping rapidly. What used to require complex orchestration frameworks, multiple API keys, and careful configuration can now be done with a single CLI command.
This matters because:
- Developers can experiment faster. No setup friction means more experimentation, which means faster discovery of what agentic workflows can actually do for your team.
- The “agents as microservices” pattern is becoming real. Each subagent operates independently with its own context, much like a microservice handles its own bounded context. The parallels between container orchestration and agent orchestration are becoming increasingly concrete.
- Web-connected agents are the new baseline. Static models with knowledge cutoffs are becoming less acceptable. The expectation is shifting toward agents that can access real-time information as needed.
Conclusion
Ollama’s addition of subagents and web search to Claude Code represents a meaningful step forward in making agentic AI accessible to every developer. The zero-configuration approach — no MCP servers, no API keys, just ollama launch — removes the biggest barrier to getting started with multi-agent workflows.
Whether you’re exploring a new codebase, conducting competitive research, or planning a major migration, having parallel AI agents with web access at your fingertips is a powerful capability.
Download Ollama from ollama.com and give it a try:
ollama launch claude --model minimax-m2.5:cloud
If you have questions or want to share your experience with Ollama’s subagents, join us on the Collabnix Slack or Discord community.
Have you tried Ollama’s new subagent capabilities? Share your experience and use cases with the community!