Join our Discord Server
Collabnix Team The Collabnix Team is a diverse collective of Docker, Kubernetes, and IoT experts united by a passion for cloud-native technologies. With backgrounds spanning across DevOps, platform engineering, cloud architecture, and container orchestration, our contributors bring together decades of combined experience from various industries and technical domains.

Docker Model Runner Tutorial: Complete Guide to Deploy AI Models on Linux (2025)

2 min read

Docker Model Runner Tutorial: Step-by-Step Guide

Deploying AI models just got as simple as running Docker containers. Docker Model Runner brings the familiar Docker experience to AI model management, letting you deploy, manage, and scale machine learning models with the same ease you’d expect from containerized applications.

This comprehensive guide will walk you through setting up Docker Model Runner on Linux systems (Debian/Ubuntu and Fedora), deploying your first AI model, and building real applications that leverage it.

What You’ll Build

By the end of this tutorial, you’ll have:

  • Docker Model Runner installed and configured
  • A running AI model (SmolLM2) accessible via API
  • Multi-language demo applications (Go, Python, Node.js, Rust) all connected to your model
  • A complete understanding of the Docker Model Runner workflow

Prerequisites

Before we begin, ensure your Linux system meets these requirements:

  • Operating System: Ubuntu/Debian or Fedora
  • Docker Engine: Installed and running
  • Memory: At least 4GB RAM
  • Network: Internet connection for downloading models
  • User Permissions: Ability to run sudo commands

Step 1: Install Docker Model Runner

The installation process is straightforward. Update your system and install the Docker Model Runner plugin:

sudo apt-get update
sudo apt-get install docker-model-plugin

Note: For Fedora users, you’ll use dnf instead:

sudo dnf update
sudo dnf install docker-model-plugin

Step 2: Verify Your Installation

Confirm Docker Model Runner is properly installed by checking the version:

docker model version

You should see version information displayed. The docker model command is now available alongside your regular Docker commands like docker rundocker ps, etc.

Step 3: Deploy Your First AI Model

Now for the exciting part – let’s deploy an AI model with a single command:

docker model run ai/smollm2

This command performs several actions:

  • Downloads the SmolLM2 model (270MB, 360 million parameters)
  • Starts a model server in the background
  • Exposes the model via API on port 12434
  • Launches an interactive chat interface

SmolLM2 is perfect for chat assistants, text extraction, rewriting, and summarization tasks. Once the command completes, you can start chatting with the model immediately. Type /bye when you’re ready to exit the chat.

Step 4: Verify Your Model is Running

Check that your model is active and accessible:

docker model ps

This shows all running models, similar to how docker ps shows running containers. You should see your ai/smollm2 model listed and running.

Step 5: Set Up Demo Applications

Let’s build some real applications that use your deployed model. Clone the official Hello GenAI repository:

git clone https://github.com/docker/hello-genai.git
cd hello-genai

This repository contains sample applications in four different programming languages:

  • Go: High-performance chatbot implementation
  • Python: Easy-to-understand, beginner-friendly version
  • Node.js: Web-optimized implementation
  • Rust: Memory-safe, systems-level implementation

Step 6: Configure the Applications

The applications need to know how to connect to your Docker Model Runner instance. Set up the configuration:

# Create environment configuration
echo "LLM_BASE_URL=http://host.docker.internal:12434/engines/llama.cpp/v1" > .env
echo "LLM_MODEL_NAME=ai/smollm2" >> .env

# Verify the configuration
cat .env

This configuration tells all applications where to find your model API and which model to use.

Step 7: Launch All Applications

Start all four demo applications with a single command:

./run.sh

This script will:

  • Build and start all four language implementations
  • Configure them to connect to your Docker Model Runner instance
  • Make them available on different ports

Step 8: Access Your Applications

Once everything is running, you can access each application:

Each application provides a web-based chat interface where you can interact with the AI model. Despite being written in different programming languages, they all connect to the same SmolLM2 model running on Docker Model Runner.

Step 9: Test the API Integration

You can also interact with the model directly via its API. Test it with curl:

curl -X POST http://localhost:12434/engines/llama.cpp/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "ai/smollm2", "messages": [{"role": "user", "content": "Hello! Can you help me write a Python function?"}]}'

The model responds using OpenAI-compatible APIs, making it easy to integrate with existing AI applications, SDKs, and tools.

What You’ve Accomplished

Congratulations! You’ve successfully:

✅ Installed Docker Model Runner on your Linux system

✅ Deployed your first AI model with a single command

✅ Built multi-language AI applications that share the same model

✅ Learned essential model management commands

✅ Verified everything works through testing

The future of AI deployment is here, and it looks a lot like the Docker experience you already love.

Further Readings

Have Queries? Join https://launchpass.com/collabnix

Collabnix Team The Collabnix Team is a diverse collective of Docker, Kubernetes, and IoT experts united by a passion for cloud-native technologies. With backgrounds spanning across DevOps, platform engineering, cloud architecture, and container orchestration, our contributors bring together decades of combined experience from various industries and technical domains.
Join our Discord Server
Index