Join our Discord Server

Docker Model Runner Cheatsheet 2025


What is Docker Model Runner?

Docker Model Runner is a revolutionary feature that enables developers to run AI models locally with zero setup complexity. Built into Docker Desktop 4.40+, it brings LLM (Large Language Model) inference directly into your containerized development workflow.

Key Benefits

  • ✅ No extra infrastructure – Runs natively on your machine
  • ✅ OpenAI-compatible API – Drop-in replacement for OpenAI calls
  • ✅ GPU acceleration – Optimized for Apple Silicon and NVIDIA GPUs
  • ✅ OCI artifacts – Package GGUF files as OCI Artifacts and publish them to any Container Registry
  • ✅ Host-based execution – Maximum performance, no VM overhead

🚀 Quick Setup Guide

Prerequisites

  • Docker Desktop 4.40+ (4.41+ for Windows GPU support)
  • macOS: Apple Silicon (M1/M2/M3) for optimal performance
  • Windows: NVIDIA GPU (for GPU acceleration)
  • Linux: Docker Engine with Model Runner

Enable Docker Model Runner

Docker Desktop (GUI)

  1. Open Docker Desktop Settings
  2. Navigate to Features in development → Beta
  3. Enable “Docker Model Runner”
  4. Apply & Restart

Docker Desktop (CLI)

# Enable Model Runner
docker desktop enable model-runner

# Enable with TCP support (for host access)
docker desktop enable model-runner --tcp 12434

# Check status
docker desktop status

Docker Engine (Linux)

sudo apt-get update
sudo apt-get install docker-model-plugin

📋 Essential Commands

Model Management

Pull Models

# Pull latest version
docker model pull ai/smollm2

List Models

# List all local models
docker model ls

Remove Models

# Remove specific model
docker model rm ai/smollm2

Running Models

Interactive Mode

# Quick inference
docker model run ai/smollm2 "Explain Docker in one sentence"

Model Information

# Inspect model details
docker model inspect ai/smollm2

🔗 API Integration

OpenAI-Compatible Endpoints

From Containers

# Base URL for container access
http://model-runner.docker.internal/engines/llama.cpp/v1/

From Host (with TCP enabled)

# Base URL for host access
http://localhost:12434/engines/llama.cpp/v1/

Chat Completions API

cURL Example

curl http://localhost:12434/engines/llama.cpp/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ai/smollm2",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful coding assistant."
      },
      {
        "role": "user", 
        "content": "Write a Docker Compose file for a web app"
      }
    ],
    "temperature": 0.7,
    "max_tokens": 500
  }'

Python Example

import openai

# Configure client for local Model Runner
client = openai.OpenAI(
    base_url="http://model-runner.docker.internal/engines/llama.cpp/v1",
    api_key="not-needed"  # Local inference doesn't need API key
)

# Chat completion
response = client.chat.completions.create(
    model="ai/smollm2",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain containerization benefits"}
    ],
    temperature=0.7,
    max_tokens=200
)

print(response.choices[0].message.content)

Node.js Example

import OpenAI from 'openai';

const openai = new OpenAI({
  baseURL: 'http://model-runner.docker.internal/engines/llama.cpp/v1',
  apiKey: 'not-needed'
});

async function chatWithModel() {
  const completion = await openai.chat.completions.create({
    model: 'ai/smollm2',
    messages: [
      { role: 'system', content: 'You are a DevOps expert.' },
      { role: 'user', content: 'Best practices for Docker in production?' }
    ],
    temperature: 0.8,
    max_tokens: 300
  });

  console.log(completion.choices[0].message.content);
}

🐳 Docker Compose Integration

services:
  chat:
    image: my-chat-app
    depends_on:
      - ai_runner

  ai_runner:
    provider:
      type: model
      options:
        model: ai/smollm2

🐳 Docker Model Management Endpoints

POST /models/create
GET /models
GET /models/{namespace}/{name}
DELETE /models/{namespace}/{name}

OpenAI Endpoints:

GET /engines/llama.cpp/v1/models
GET /engines/llama.cpp/v1/models/{namespace}/{name}
POST /engines/llama.cpp/v1/chat/completions
POST /engines/llama.cpp/v1/completions
POST /engines/llama.cpp/v1/embeddings
Join our Discord Server