Join our Discord Server
Docker Model Runner

Introduction to Docker Model Runner

Estimated reading: 2 minutes 430 views

Docker Model Runner is a new experimental feature introduced in Docker Desktop for Mac 4.40+. It provides a Docker-native experience for running Large Language Models (LLMs) locally, seamlessly integrating with existing container tooling and workflows. The feature is specifically optimized to utilize Apple Silicon Mac’s GPU resources for efficient model inference as of now. Windows support with NVIDIA GPU is coming soon. (early April 2025).

With the Model Runner feature, Docker provides inference capabilities to developers on their laptop, and in the future in CI, allowing them to run LLM models locally. This is an important feature to help developing GenAI applications. The runner essentially provides GPU-accelerated inference engines that are accessible both through the Docker socket (/var/run/docker.sock) and via a TCP connection at model-runner.docker.internal:80.

Native Docker Integration through the new docker model CLI

Docker Desktop 4.40+ introduces `docker model` CLI as the first class-citizen. This means AI models are now treated as fundamental, well-supported objects within the Docker CLI, similar to how Docker already treats containers, images, and volumes. 

By using this new CLI, developers can:

  • Pull models from registries (e,g Docker Hub)
  • Run models locally with GPU acceleration
  • Integrate models into their development workflows
  • Test GenAI applications during development without relying on external APIs

This capability is particularly valuable for developing and testing GenAI applications locally before deployment, allowing for faster iteration cycles and reduced dependency on cloud services during development.

Leave a Reply

Share this Doc

Introduction to Docker Model Runner

Or copy link

CONTENTS
Join our Discord Server