Join our Discord Server
Collabnix Team The Collabnix Team is a diverse collective of Docker, Kubernetes, and IoT experts united by a passion for cloud-native technologies. With backgrounds spanning across DevOps, platform engineering, cloud architecture, and container orchestration, our contributors bring together decades of combined experience from various industries and technical domains.

FunctionGemma: Building Offline AI Agents with Docker Model Runner

2 min read

Building Offline AI Agents with Docker Model Runner

The AI industry is shifting from chatbots to agents. But here’s the problem: most function-calling models are either cloud-dependent or too large to run efficiently on edge devices. Google just solved this with FunctionGemma – a 270M parameter model specifically fine-tuned for translating natural language into executable API actions.

At just 301MB, FunctionGemma runs on laptops, mobile phones, and edge devices while delivering production-grade function calling. It uses only 0.75% battery for 25 conversations on a Pixel 9 Pro. This isn’t a general-purpose chatbot – it’s a specialized foundation model designed to be fine-tuned for your specific use case.

Why Docker Model Runner?

Docker Model Runner treats AI models as first-class OCI artifacts – pulled, versioned, and served through the same workflows you use for containers. No Python environment setup. No CUDA driver wrestling. Just docker model pull and you’re running.

Key advantages:

  • OpenAI-compatible API – Drop-in replacement for existing code
  • Zero configuration – Models load on-demand and auto-unload
  • Native performance – Host-side execution, GPU-accelerated
  • Docker Compose integration – Define models alongside services

Let’s build a private, offline function-calling agent that runs entirely with Docker.

Quick Start with Docker Model Runner

Step 1: Enable Model Runner

# Enable in Docker Desktop (Settings → AI → Enable Docker Model Runner)
# Or via CLI
docker desktop enable model-runner --tcp 12434

Step 2: Pull FunctionGemma

docker model pull ai/functiongemma

That’s it. The model is now available at http://localhost:12434/engines/llama.cpp/v1 with an OpenAI-compatible API.

OR you can directly pull it using Docker Desktop.

Offline AI agents in action using Docker Model Runner

Step 3: Test Function Calling

Create test-agent.js:

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'http://localhost:12434/engines/llama.cpp/v1',
  apiKey: 'not-needed'
});

const tools = [{
  type: 'function',
  function: {
    name: 'get_weather',
    description: 'Get weather for a city',
    parameters: {
      type: 'object',
      properties: {
        city: { type: 'string', description: 'City name' }
      },
      required: ['city']
    }
  }
}];

async function runAgent() {
  const response = await client.chat.completions.create({
    model: 'ai/functiongemma',
    messages: [{ 
      role: 'user', 
      content: 'What\'s the weather in San Francisco?' 
    }],
    tools: tools
  });

  const toolCall = response.choices[0].message.tool_calls[0];
  console.log('Function:', toolCall.function.name);
  console.log('Arguments:', toolCall.function.arguments);
}

runAgent();


Production Setup with Docker Compose

Create compose.yml:

services:
  function-agent:
    image: node:22-slim
    working_dir: /app
    volumes:
      - ./app:/app
    command: node agent.js
    depends_on:
      - weather_model
    environment:
      - MODEL_URL=${WEATHER_MODEL_URL}
      - MODEL_NAME=${WEATHER_MODEL_MODEL}

models:
  weather_model:
    model: ai/functiongemma
    context_size: 4096
    runtime_flags:
      - "--verbose"

Create your agent (app/agent.js):

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: process.env.MODEL_URL,
  apiKey: 'not-needed'
});

function getWeather(city) {
  return { city, temperature: 22, condition: 'sunny' };
}

const tools = [{
  type: 'function',
  function: {
    name: 'get_weather',
    description: 'Get weather for a city',
    parameters: {
      type: 'object',
      properties: {
        city: { type: 'string' }
      },
      required: ['city']
    }
  }
}];

async function processRequest(prompt) {
  const response = await client.chat.completions.create({
    model: process.env.MODEL_NAME,
    messages: [{ role: 'user', content: prompt }],
    tools: tools
  });

  const message = response.choices[0].message;
  
  if (message.tool_calls) {
    for (const call of message.tool_calls) {
      if (call.function.name === 'get_weather') {
        const args = JSON.parse(call.function.arguments);
        return getWeather(args.city);
      }
    }
  }
  
  return message.content;
}

// Run the agent
const result = await processRequest("What's the weather in Tokyo?");
console.log(result);

Run it:

docker compose up

Docker Model Runner automatically:

  • Pulls the model if not cached
  • Starts the inference server
  • Injects WEATHER_MODEL_URL and WEATHER_MODEL_MODEL env vars
  • Your app connects seamlessly

The Real Power: Fine-Tuning

Out-of-the-box accuracy: 58%
After fine-tuning: 85%+

This is where specialized, small models beat general-purpose giants. Fine-tune FunctionGemma for your domain – mobile automation, IoT control, game mechanics, home automation – and watch it become a reliable, offline agent that fits in 301MB.

Use Cases Where FunctionGemma Shines

  • IoT device control – Voice commands to system APIs without cloud
  • Mobile automation – Natural language to app functions
  • Edge computing – Offline agents in disconnected environments
  • Custom game mechanics – Voice-controlled gameplay
  • Home automation – Private, local smart home agents
  • Kiosk systems – Offline customer service terminals

What Makes This Different?

Traditional approach:

  1. Install Ollama
  2. Configure separate services
  3. Manage ports manually
  4. Write custom integration code

Docker Model Runner approach:

  1. docker model pull ai/functiongemma
  2. Define in compose.yml
  3. Run docker compose up

Models are OCI artifacts. They version, distribute, and deploy exactly like containers. This is the future of AI infrastructure.

What’s Next?

FunctionGemma represents the future of AI agents – lightweight, specialized, and deployable anywhere. At 270M parameters, it’s not trying to be GPT-4. It’s the perfect foundation for your specific agent.

The era of chatbots is over. The era of action is here.

Have Queries? Join https://launchpass.com/collabnix

Collabnix Team The Collabnix Team is a diverse collective of Docker, Kubernetes, and IoT experts united by a passion for cloud-native technologies. With backgrounds spanning across DevOps, platform engineering, cloud architecture, and container orchestration, our contributors bring together decades of combined experience from various industries and technical domains.
Join our Discord Server
Index