Docker Model Runner Tutorial: Step-by-Step Guide
Deploying AI models just got as simple as running Docker containers. Docker Model Runner brings the familiar Docker experience to AI model management, letting you deploy, manage, and scale machine learning models with the same ease you’d expect from containerized applications.
This comprehensive guide will walk you through setting up Docker Model Runner on Linux systems (Debian/Ubuntu and Fedora), deploying your first AI model, and building real applications that leverage it.
What You’ll Build
By the end of this tutorial, you’ll have:
- Docker Model Runner installed and configured
- A running AI model (SmolLM2) accessible via API
- Multi-language demo applications (Go, Python, Node.js, Rust) all connected to your model
- A complete understanding of the Docker Model Runner workflow
Prerequisites
Before we begin, ensure your Linux system meets these requirements:
- Operating System: Ubuntu/Debian or Fedora
- Docker Engine: Installed and running
- Memory: At least 4GB RAM
- Network: Internet connection for downloading models
- User Permissions: Ability to run
sudo
commands
Step 1: Install Docker Model Runner
The installation process is straightforward. Update your system and install the Docker Model Runner plugin:
sudo apt-get update
sudo apt-get install docker-model-plugin
Note: For Fedora users, you’ll use dnf
instead:
sudo dnf update
sudo dnf install docker-model-plugin
Step 2: Verify Your Installation
Confirm Docker Model Runner is properly installed by checking the version:
docker model version
You should see version information displayed. The docker model
command is now available alongside your regular Docker commands like docker run
, docker ps
, etc.
Step 3: Deploy Your First AI Model
Now for the exciting part – let’s deploy an AI model with a single command:
docker model run ai/smollm2
This command performs several actions:
- Downloads the SmolLM2 model (270MB, 360 million parameters)
- Starts a model server in the background
- Exposes the model via API on port 12434
- Launches an interactive chat interface
SmolLM2 is perfect for chat assistants, text extraction, rewriting, and summarization tasks. Once the command completes, you can start chatting with the model immediately. Type /bye
when you’re ready to exit the chat.
Step 4: Verify Your Model is Running
Check that your model is active and accessible:
docker model ps
This shows all running models, similar to how docker ps
shows running containers. You should see your ai/smollm2
model listed and running.
Step 5: Set Up Demo Applications
Let’s build some real applications that use your deployed model. Clone the official Hello GenAI repository:
git clone https://github.com/docker/hello-genai.git
cd hello-genai
This repository contains sample applications in four different programming languages:
- Go: High-performance chatbot implementation
- Python: Easy-to-understand, beginner-friendly version
- Node.js: Web-optimized implementation
- Rust: Memory-safe, systems-level implementation
Step 6: Configure the Applications
The applications need to know how to connect to your Docker Model Runner instance. Set up the configuration:
# Create environment configuration
echo "LLM_BASE_URL=http://host.docker.internal:12434/engines/llama.cpp/v1" > .env
echo "LLM_MODEL_NAME=ai/smollm2" >> .env
# Verify the configuration
cat .env
This configuration tells all applications where to find your model API and which model to use.
Step 7: Launch All Applications
Start all four demo applications with a single command:
./run.sh
This script will:
- Build and start all four language implementations
- Configure them to connect to your Docker Model Runner instance
- Make them available on different ports
Step 8: Access Your Applications
Once everything is running, you can access each application:
- Go Application: http://localhost:8080
- Python Application: http://localhost:8081
- Node.js Application: http://localhost:8082
- Rust Application: http://localhost:8083
Each application provides a web-based chat interface where you can interact with the AI model. Despite being written in different programming languages, they all connect to the same SmolLM2 model running on Docker Model Runner.
Step 9: Test the API Integration
You can also interact with the model directly via its API. Test it with curl:
curl -X POST http://localhost:12434/engines/llama.cpp/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "ai/smollm2", "messages": [{"role": "user", "content": "Hello! Can you help me write a Python function?"}]}'
The model responds using OpenAI-compatible APIs, making it easy to integrate with existing AI applications, SDKs, and tools.
What You’ve Accomplished
Congratulations! You’ve successfully:
✅ Installed Docker Model Runner on your Linux system
✅ Deployed your first AI model with a single command
✅ Built multi-language AI applications that share the same model
✅ Learned essential model management commands
✅ Verified everything works through testing
The future of AI deployment is here, and it looks a lot like the Docker experience you already love.