Join our Discord Server
Docker Model Runner

Getting Started with Docker Model Runner

Estimated reading: 3 minutes 915 views
  • Install the latest version of Docker Desktop 4.40+
  • Ensure that “Docker Model Runner” is enabled (it should by default in 4.40) 

There are two ways to enable Model Runner – either using CLI or using Docker Dashboard.

Using CLI

$ docker desktop enable model-runner

Using Docker Dashboard

The “Enable host-side TCP support” feature allows Docker Model Runner to additionally accept connections on the host OS on the specified TCP port (default: 12434) rather than only through the host Docker socket (/var/run/docker.sock). You can change this to another port if needed, particularly if 12434 is already in use by another application. We will see its usage later in the docs.

Open up the terminal and you should be able to see docker model as the new CLI.

docker model --help
Usage:  docker model COMMAND

Docker Model Runner

Commands:
  inspect     Display detailed information on one model
  list        List the available models that can be run with the Docker Model Runner
  pull        Download a model
  rm          Remove a model downloaded from Docker Hub
  run         Run a model with the Docker Model Runner
  status      Check if the Docker Model Runner is running
  version     Show the Docker Model Runner version

Run 'docker model COMMAND --help' for more information on a command.
  1. Check if the Model Runner is running or not
$ docker model status

Docker Model Runner is running
  1. List the available models
$ docker model ls

The response shows an empty list.

Let’s go ahead and download the model from the Docker Hub.

  1. Download a model
$ docker model pull ai/llama3.2:1B-Q8_0

All these models are hosted on https://hub.docker.com/u/ai

  • ai/gemma3
  • ai/llama3.2
  • ai/qwq
  • ai/mistral-nemo
  • ai/mistral
  • ai/phi4
  • ai/qwen2.5
  • ai/deepseek-r1-distill-llama (distill means it’s not the actual RL-ed deepseek, it’s a llama trained on DeepSeek-R1 inputs/outputs) 
  1. List the Model
docker model ls
MODEL                PARAMETERS  QUANTIZATION  ARCHITECTURE  MODEL ID      CREATED       SIZE
ai/llama3.2:1B-Q8_0  1.24 B      Q8_0          llama         a15c3117eeeb  20 hours ago  1.22 GiB

  1. Use docker model run to send a single message
docker model run ai/llama3.2:1B-Q8_0 "Hi"
Hello! How can I help you today?

  1. Run the Model in interactive mode 
docker model run ai/llama3.2:1B-Q8_0
Interactive chat mode started. Type '/bye' to exit.

> why is water blue?

Water appears blue because ...

  1. Remove the model
docker model rm ai/llama3.2:1B-Q8_0
  1. Search a Model
docker search qwen | grep ai
qwenllm/qwen          The official repo of Qwen chat & pretrained …   18        
ai/qwen2.5            Versatile Qwen update with better language s…   6         
ai/qwen3              Qwen3 is the latest Qwen LLM, built for top-…   18   

Connection Methods

There are three primary ways to interact with the Model Runner:

1. From within containers

Containers can access the Model Runner via the internal DNS name:

http://model-runner.docker.internal/ 

2. From the host via the Docker Socket

Access via the Docker socket:

curl --unix-socket /var/run/docker.sock \
    localhost/exp/vDD4.40/engines/llama.cpp/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{"model":"ai/llama3.2:1B-Q8_0",...}'

3. From the host via TCP

When TCP host support is enabled, you can either:

  • Use the specified port directly (default 12434)
  • Use a helper container as a reverse-proxy:
docker run -d --name model-runner-proxy -p 8080:80 \

  alpine/socat tcp-listen:80,fork,reuseaddr tcp:model-runner.docker.internal:80

OpenAI API Compatibility

The Model Runner implements OpenAI-compatible endpoints:

OpenAI-compatible endpoints

GET /engines/{backend}/v1/models

GET /engines/{backend}/v1/models/{namespace}/{name}

POST /engines/{backend}/v1/chat/completions

POST /engines/{backend}/v1/completions

POST /engines/{backend}/v1/embeddings

You can specify which model to use in the request payload, and the Model Runner will automatically load it if available.

Leave a Reply

Share this Doc

Getting Started with Docker Model Runner

Or copy link

CONTENTS
Join our Discord Server