How to successfully run Open WebUI with Docker Model Runner

Table of Contents

The landscape of local AI development has evolved dramatically in recent years, with developers increasingly seeking privacy-focused, offline-capable solutions for running Large Language Models (LLMs). Two powerful tools have emerged to address this need: OpenWebUI and Docker Model Runner. This comprehensive guide will explore both technologies and demonstrate how to leverage them together for an optimal local AI development experience.

What is OpenWebUI?

OpenWebUI is an extensible, feature-rich, and user-friendly self-hosted AI platform designed to operate entirely offline. It supports various LLM runners and OpenAI-compatible APIs, with built-in inference engine for RAG, making it a powerful AI deployment solution.

Key Features of OpenWebUI

OpenWebUI offers an impressive array of features that make it a compelling choice for local AI deployments:

🎯 User Interface Excellence

ChatGPT-style responsive chat interface
Dark/Light theme support with custom theming capabilities
Multilingual support (i18n) for global accessibility
Mobile-responsive design for cross-device compatibility

🤖 Multi-Model Support

Seamlessly engage with various models simultaneously, harnessing their unique strengths for optimal responses
Support for OpenAI-compatible APIs and local model runners
Model Builder functionality for creating custom agents and characters

🔒 Privacy and Security

Role-Based Access Control (RBAC): Ensure secure access with restricted permissions; only authorized individuals can access your models, and exclusive model creation/pulling rights are reserved for administrators
Complete offline operation capability
Local data storage with no external dependencies

📚 Advanced AI Capabilities

Local RAG Integration: Dive into the future of chat interactions with groundbreaking Retrieval Augmented Generation (RAG) support
Image generation integration with AUTOMATIC1111 API, ComfyUI, or DALL-E
Voice and video call features for interactive communication
Native Python function calling with built-in code editor

What is Docker Model Runner?

Docker Model Runner is an experimental feature introduced in Docker Desktop 4.40+ that provides a Docker-native experience for running Large Language Models (LLMs) locally, seamlessly integrating with existing container tooling and workflows. The feature is specifically optimized to utilize Apple Silicon Mac’s GPU resources for efficient model inference, with Windows support with NVIDIA GPU coming soon.

With the Model Runner feature, Docker provides inference capabilities to developers on their laptop, and in the future in CI, allowing them to run LLM models locally. This is an important feature to help developing GenAI applications. The runner essentially provides GPU-accelerated inference engines that are accessible both through the Docker socket (/var/run/docker.sock) and via a TCP connection at model-runner.docker.internal:80.

Key Features of Docker Model Runner

🚀 Native Docker Integration

Docker Desktop 4.40+ introduces docker model CLI as a first-class citizen
AI models are now treated as fundamental, well-supported objects within the Docker CLI, similar to containers, images, and volumes
Pull models from registries (Docker Hub), run models locally with GPU acceleration, and integrate models into development workflows

⚡ Host-Based Performance

Unlike traditional Docker containers, AI models DO NOT run in a container with Docker Model Runner
Uses a host-installed inference server (llama.cpp) that runs natively on your Mac for direct hardware access
By running directly on the host, the inference server can access Apple’s Metal API for direct GPU acceleration without containerization overhead

📦 OCI Artifact Storage

Models are stored as OCI artifacts in Docker Hub, using a standardized format supported by any Docker Registry
Working with models as OCI artifacts provides faster deployments and lower disk requirements
No compression of layers (model weights are largely uncompressible), so you get both compressed and uncompressed versions efficiency

🔗 Flexible Connection Methods Three primary ways to interact with Docker Model Runner:

From within containers: http://model-runner.docker.internal/
From the host via Docker Socket: Access via /var/run/docker.sock
From the host via TCP: When TCP host support is enabled (default port 12434)

🌐 OpenAI API Compatibility Docker Model Runner implements OpenAI-compatible endpoints:

GET /engines/{backend}/v1/models
GET /engines/{backend}/v1/models/{namespace}/{name}
POST /engines/{backend}/v1/chat/completions
POST /engines/{backend}/v1/completions
POST /engines/{backend}/v1/embeddings

Prerequisites

Before we begin, ensure you have the following installed:

Required Software:

Docker Desktop 4.40 or later (for Docker Model Runner support)
At least 8GB of RAM (16GB+ recommended for larger models)
Sufficient disk space for models (models range from 1GB to 20GB+)

System Requirements:

macOS with Apple Silicon (for Docker Model Runner GPU acceleration)
Linux with Docker Engine (alternative setup)
Windows with Docker Desktop (limited GPU support)

Setting Up Docker Model Runner

Step 1: Enable Docker Model Runner

Docker Model Runner is available in Docker Desktop 4.40+ for macOS with Apple Silicon. Windows support with NVIDIA GPU is coming soon. Enable it using one of these methods:

Option A: Command Line (Recommended)

# Enable Docker Model Runner without TCP (more secure)
docker desktop enable model-runner

# Or enable with TCP host support (default port 12434)
docker desktop enable model-runner --tcp 12434

Option B: Docker Desktop GUI

Open Docker Desktop Settings
Navigate to Features in development
Go to the Beta tab
Check Enable Docker Model Runner
Optionally enable “Enable host-side TCP support” for external connections
Click Apply and restart
Quit and reopen Docker Desktop

Step 2: Verify Installation

Confirm Docker Model Runner is working:

# Check available commands
docker model --help

# Check if Model Runner is running
docker model status
# Output: Docker Model Runner is running

# List available models (should be empty initially)
docker model list

Step 3: Download Models

Pull models from the official AI model registry at https://hub.docker.com/u/ai:

# Popular lightweight models (recommended for most systems)
docker model pull ai/llama3.2:1B-Q8_0     # ~1.22 GiB - Fast and efficient
docker model pull ai/qwen2.5:0.5B-F16     # ~1GB - Ultra-lightweight
docker model pull ai/gemma3-qat:1B-Q4_K_M # ~1GB - Quantized Gemma

# Additional available models (choose based on your hardware):
docker model pull ai/gemma3               # Google's Gemma family
docker model pull ai/qwq                  # QwQ reasoning model
docker model pull ai/mistral-nemo         # Mistral Nemo
docker model pull ai/mistral              # Mistral family
docker model pull ai/phi4                 # Microsoft Phi-4
docker model pull ai/deepseek-r1-distill-llama # DeepSeek R1 distilled

Important: Only download models that fit within your available VRAM to avoid system slowdowns. More models are being added regularly to the AI model registry.

Step 4: Test Model Interaction

# List downloaded models
docker model list

# Test with a single prompt
docker model run ai/llama3.2:1B-Q8_0 "Hello, introduce yourself"

# Interactive chat mode
docker model run ai/llama3.2:1B-Q8_0
# Interactive chat mode started. Type '/bye' to exit.
# > Why is water blue?
# Water appears blue because...
# > /bye

# Remove a model if needed
docker model rm ai/llama3.2:1B-Q8_0

Step 5: Monitor GPU Usage (macOS)

On your Mac, you can monitor GPU usage in real-time:

Press Command + Spacebar to open Spotlight
Type “Activity Monitor” and open it
Select the GPU tab to view GPU history
You’ll see GPU processes triggered when making inference requests

Setting Up OpenWebUI

Option 1: Quick Start with Docker Compose (Recommended)

The fastest way to get started is using Docker Compose with the provider feature:

# Save this as docker-compose.yml
services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    ports:
      - "3000:8080"
    environment:
      - OPENAI_API_BASE_URL=http://model-runner.docker.internal/engines/llama.cpp/v1
      - OPENAI_API_KEY=na
      - WEBUI_NAME=Docker Model Runner Interface
    volumes:
      - open-webui:/app/backend/data
    depends_on:
      - llm-runner

  llm-runner:
    provider:
      type: model
      options:
        model: ai/llama3.2:1B-Q8_0

volumes:
  open-webui:

Deploy the stack:

# Save the YAML above as docker-compose.yml then run:
docker-compose up -d

This will:

Automatically pull the OpenWebUI image
Configure the connection to Docker Model Runner
Set up the necessary networking and volumes
Start the service on http://localhost:3000

Initial Setup and Access

Access OpenWebUI: Open your browser and navigate to http://localhost:3000
Create Admin Account: The first user you create becomes the admin automatically
Login: Use your credentials to access the interface
Select Model: Choose from the available models in the dropdown menu

Troubleshooting First Access:

If no models appear, wait a few moments for the model download to complete
Check container logs: docker-compose logs llm-runner
Verify models are available: docker model list
Check Docker Model Runner status: docker model status dropdown menu

Troubleshooting First Access:

If no models appear, wait a few moments for the model download to complete
Check container logs: docker-compose logs llm-runner
Verify models are available: docker model list

How to Add Models to OpenWebUI

OpenWebUI supports multiple ways to add and manage models. Here’s a comprehensive guide for each method:

Method 1: Adding Models Through the UI

Access the Admin Panel:
- Navigate to http://localhost:3000
- Log in with your admin account
- Click on your profile → Admin Panel
Add External API Models:
- Go to Settings → Connections
- Click + Add Connection
- Configure your model endpoint: Name: My Custom ModelBase URL: http://localhost:12434/v1API Key: (if required)
Import Models from Model Library:
- Navigate to Workspace → Models
- Click + Add Model
- Browse available models or import from URLs

Conclusion

The combination of OpenWebUI and Docker Model Runner represents the future of local AI development and deployment. Docker Model Runner delivers this by including an inference engine as part of Docker Desktop, built on top of llama.cpp and accessible through the familiar OpenAI API. No extra tools, no extra setup, and no disconnected workflows.

This powerful integration offers several key advantages:

Privacy-First: Complete offline operation with local data storage and no external API dependencies
Developer-Friendly: Familiar Docker workflows with minimal setup using the modern provider feature
Enterprise-Ready: Support for multiple models from the official AI model registry at https://hub.docker.com/u/ai
Highly Scalable: From development to production deployments with monitoring and load balancing
Cost-Effective: No cloud API costs or usage limits, run models locally on your hardware
Modern Architecture: Uses the latest Docker Compose provider feature for automatic model management
Performance Optimized: Host-based execution for optimal GPU utilization on supported platforms

The provider-based approach introduced in Docker Desktop 4.40+ revolutionizes model management by automatically handling downloads and lifecycle management through standard Docker Compose workflows. This makes local AI development more accessible and reliable than ever before.

Whether you’re building AI applications, experimenting with different models, or requiring privacy-focused AI solutions, the OpenWebUI and Docker Model Runner combination provides a robust, modern foundation for your local AI infrastructure.

As both technologies continue to evolve, we can expect even tighter integration, expanded platform support, and additional features that will cement Docker Model Runner as the standard for local AI development workflows.