Building Offline AI Agents with Docker Model Runner
The AI industry is shifting from chatbots to agents. But here’s the problem: most function-calling models are either cloud-dependent or too large to run efficiently on edge devices. Google just solved this with FunctionGemma – a 270M parameter model specifically fine-tuned for translating natural language into executable API actions.
At just 301MB, FunctionGemma runs on laptops, mobile phones, and edge devices while delivering production-grade function calling. It uses only 0.75% battery for 25 conversations on a Pixel 9 Pro. This isn’t a general-purpose chatbot – it’s a specialized foundation model designed to be fine-tuned for your specific use case.
Why Docker Model Runner?
Docker Model Runner treats AI models as first-class OCI artifacts – pulled, versioned, and served through the same workflows you use for containers. No Python environment setup. No CUDA driver wrestling. Just docker model pull and you’re running.
Key advantages:
- OpenAI-compatible API – Drop-in replacement for existing code
- Zero configuration – Models load on-demand and auto-unload
- Native performance – Host-side execution, GPU-accelerated
- Docker Compose integration – Define models alongside services
Let’s build a private, offline function-calling agent that runs entirely with Docker.
Quick Start with Docker Model Runner
Step 1: Enable Model Runner
# Enable in Docker Desktop (Settings → AI → Enable Docker Model Runner)
# Or via CLI
docker desktop enable model-runner --tcp 12434
Step 2: Pull FunctionGemma
docker model pull ai/functiongemma
That’s it. The model is now available at http://localhost:12434/engines/llama.cpp/v1 with an OpenAI-compatible API.
OR you can directly pull it using Docker Desktop.

Step 3: Test Function Calling
Create test-agent.js:
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'http://localhost:12434/engines/llama.cpp/v1',
apiKey: 'not-needed'
});
const tools = [{
type: 'function',
function: {
name: 'get_weather',
description: 'Get weather for a city',
parameters: {
type: 'object',
properties: {
city: { type: 'string', description: 'City name' }
},
required: ['city']
}
}
}];
async function runAgent() {
const response = await client.chat.completions.create({
model: 'ai/functiongemma',
messages: [{
role: 'user',
content: 'What\'s the weather in San Francisco?'
}],
tools: tools
});
const toolCall = response.choices[0].message.tool_calls[0];
console.log('Function:', toolCall.function.name);
console.log('Arguments:', toolCall.function.arguments);
}
runAgent();
Production Setup with Docker Compose
Create compose.yml:
services:
function-agent:
image: node:22-slim
working_dir: /app
volumes:
- ./app:/app
command: node agent.js
depends_on:
- weather_model
environment:
- MODEL_URL=${WEATHER_MODEL_URL}
- MODEL_NAME=${WEATHER_MODEL_MODEL}
models:
weather_model:
model: ai/functiongemma
context_size: 4096
runtime_flags:
- "--verbose"
Create your agent (app/agent.js):
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: process.env.MODEL_URL,
apiKey: 'not-needed'
});
function getWeather(city) {
return { city, temperature: 22, condition: 'sunny' };
}
const tools = [{
type: 'function',
function: {
name: 'get_weather',
description: 'Get weather for a city',
parameters: {
type: 'object',
properties: {
city: { type: 'string' }
},
required: ['city']
}
}
}];
async function processRequest(prompt) {
const response = await client.chat.completions.create({
model: process.env.MODEL_NAME,
messages: [{ role: 'user', content: prompt }],
tools: tools
});
const message = response.choices[0].message;
if (message.tool_calls) {
for (const call of message.tool_calls) {
if (call.function.name === 'get_weather') {
const args = JSON.parse(call.function.arguments);
return getWeather(args.city);
}
}
}
return message.content;
}
// Run the agent
const result = await processRequest("What's the weather in Tokyo?");
console.log(result);
Run it:
docker compose up
Docker Model Runner automatically:
- Pulls the model if not cached
- Starts the inference server
- Injects
WEATHER_MODEL_URLandWEATHER_MODEL_MODELenv vars - Your app connects seamlessly
The Real Power: Fine-Tuning
Out-of-the-box accuracy: 58%
After fine-tuning: 85%+
This is where specialized, small models beat general-purpose giants. Fine-tune FunctionGemma for your domain – mobile automation, IoT control, game mechanics, home automation – and watch it become a reliable, offline agent that fits in 301MB.
Use Cases Where FunctionGemma Shines
- IoT device control – Voice commands to system APIs without cloud
- Mobile automation – Natural language to app functions
- Edge computing – Offline agents in disconnected environments
- Custom game mechanics – Voice-controlled gameplay
- Home automation – Private, local smart home agents
- Kiosk systems – Offline customer service terminals
What Makes This Different?
Traditional approach:
- Install Ollama
- Configure separate services
- Manage ports manually
- Write custom integration code
Docker Model Runner approach:
docker model pull ai/functiongemma- Define in
compose.yml - Run
docker compose up
Models are OCI artifacts. They version, distribute, and deploy exactly like containers. This is the future of AI infrastructure.
What’s Next?
FunctionGemma represents the future of AI agents – lightweight, specialized, and deployable anywhere. At 270M parameters, it’s not trying to be GPT-4. It’s the perfect foundation for your specific agent.
The era of chatbots is over. The era of action is here.