Join our Discord Server
Collabnix Team The Collabnix Team is a diverse collective of Docker, Kubernetes, and IoT experts united by a passion for cloud-native technologies. With backgrounds spanning across DevOps, platform engineering, cloud architecture, and container orchestration, our contributors bring together decades of combined experience from various industries and technical domains.

How to successfully run Open WebUI with Docker Model Runner

6 min read

The landscape of local AI development has evolved dramatically in recent years, with developers increasingly seeking privacy-focused, offline-capable solutions for running Large Language Models (LLMs). Two powerful tools have emerged to address this need: OpenWebUI and Docker Model Runner. This comprehensive guide will explore both technologies and demonstrate how to leverage them together for an optimal local AI development experience.

What is OpenWebUI?

OpenWebUI is an extensible, feature-rich, and user-friendly self-hosted AI platform designed to operate entirely offline. It supports various LLM runners and OpenAI-compatible APIs, with built-in inference engine for RAG, making it a powerful AI deployment solution.

Key Features of OpenWebUI

OpenWebUI offers an impressive array of features that make it a compelling choice for local AI deployments:

🎯 User Interface Excellence

  • ChatGPT-style responsive chat interface
  • Dark/Light theme support with custom theming capabilities
  • Multilingual support (i18n) for global accessibility
  • Mobile-responsive design for cross-device compatibility

🤖 Multi-Model Support

  • Seamlessly engage with various models simultaneously, harnessing their unique strengths for optimal responses
  • Support for OpenAI-compatible APIs and local model runners
  • Model Builder functionality for creating custom agents and characters

🔒 Privacy and Security

  • Role-Based Access Control (RBAC): Ensure secure access with restricted permissions; only authorized individuals can access your models, and exclusive model creation/pulling rights are reserved for administrators
  • Complete offline operation capability
  • Local data storage with no external dependencies

📚 Advanced AI Capabilities

  • Local RAG Integration: Dive into the future of chat interactions with groundbreaking Retrieval Augmented Generation (RAG) support
  • Image generation integration with AUTOMATIC1111 API, ComfyUI, or DALL-E
  • Voice and video call features for interactive communication
  • Native Python function calling with built-in code editor

What is Docker Model Runner?

Docker Model Runner is an experimental feature introduced in Docker Desktop 4.40+ that provides a Docker-native experience for running Large Language Models (LLMs) locally, seamlessly integrating with existing container tooling and workflows. The feature is specifically optimized to utilize Apple Silicon Mac’s GPU resources for efficient model inference, with Windows support with NVIDIA GPU coming soon.

With the Model Runner feature, Docker provides inference capabilities to developers on their laptop, and in the future in CI, allowing them to run LLM models locally. This is an important feature to help developing GenAI applications. The runner essentially provides GPU-accelerated inference engines that are accessible both through the Docker socket (/var/run/docker.sock) and via a TCP connection at model-runner.docker.internal:80.

Key Features of Docker Model Runner

🚀 Native Docker Integration

  • Docker Desktop 4.40+ introduces docker model CLI as a first-class citizen
  • AI models are now treated as fundamental, well-supported objects within the Docker CLI, similar to containers, images, and volumes
  • Pull models from registries (Docker Hub), run models locally with GPU acceleration, and integrate models into development workflows

⚡ Host-Based Performance

  • Unlike traditional Docker containers, AI models DO NOT run in a container with Docker Model Runner
  • Uses a host-installed inference server (llama.cpp) that runs natively on your Mac for direct hardware access
  • By running directly on the host, the inference server can access Apple’s Metal API for direct GPU acceleration without containerization overhead

📦 OCI Artifact Storage

  • Models are stored as OCI artifacts in Docker Hub, using a standardized format supported by any Docker Registry
  • Working with models as OCI artifacts provides faster deployments and lower disk requirements
  • No compression of layers (model weights are largely uncompressible), so you get both compressed and uncompressed versions efficiency

🔗 Flexible Connection Methods Three primary ways to interact with Docker Model Runner:

  1. From within containers: http://model-runner.docker.internal/
  2. From the host via Docker Socket: Access via /var/run/docker.sock
  3. From the host via TCP: When TCP host support is enabled (default port 12434)

🌐 OpenAI API Compatibility Docker Model Runner implements OpenAI-compatible endpoints:

  • GET /engines/{backend}/v1/models
  • GET /engines/{backend}/v1/models/{namespace}/{name}
  • POST /engines/{backend}/v1/chat/completions
  • POST /engines/{backend}/v1/completions
  • POST /engines/{backend}/v1/embeddings

Prerequisites

Before we begin, ensure you have the following installed:

Required Software:

  • Docker Desktop 4.40 or later (for Docker Model Runner support)
  • At least 8GB of RAM (16GB+ recommended for larger models)
  • Sufficient disk space for models (models range from 1GB to 20GB+)

System Requirements:

  • macOS with Apple Silicon (for Docker Model Runner GPU acceleration)
  • Linux with Docker Engine (alternative setup)
  • Windows with Docker Desktop (limited GPU support)

Setting Up Docker Model Runner

Step 1: Enable Docker Model Runner

Docker Model Runner is available in Docker Desktop 4.40+ for macOS with Apple Silicon. Windows support with NVIDIA GPU is coming soon. Enable it using one of these methods:

Option A: Command Line (Recommended)

# Enable Docker Model Runner without TCP (more secure)
docker desktop enable model-runner

# Or enable with TCP host support (default port 12434)
docker desktop enable model-runner --tcp 12434

Option B: Docker Desktop GUI

  1. Open Docker Desktop Settings
  2. Navigate to Features in development
  3. Go to the Beta tab
  4. Check Enable Docker Model Runner
  5. Optionally enable “Enable host-side TCP support” for external connections
  6. Click Apply and restart
  7. Quit and reopen Docker Desktop

Step 2: Verify Installation

Confirm Docker Model Runner is working:

# Check available commands
docker model --help

# Check if Model Runner is running
docker model status
# Output: Docker Model Runner is running

# List available models (should be empty initially)
docker model list

Step 3: Download Models

Pull models from the official AI model registry at https://hub.docker.com/u/ai:

# Popular lightweight models (recommended for most systems)
docker model pull ai/llama3.2:1B-Q8_0     # ~1.22 GiB - Fast and efficient
docker model pull ai/qwen2.5:0.5B-F16     # ~1GB - Ultra-lightweight
docker model pull ai/gemma3-qat:1B-Q4_K_M # ~1GB - Quantized Gemma

# Additional available models (choose based on your hardware):
docker model pull ai/gemma3               # Google's Gemma family
docker model pull ai/qwq                  # QwQ reasoning model
docker model pull ai/mistral-nemo         # Mistral Nemo
docker model pull ai/mistral              # Mistral family
docker model pull ai/phi4                 # Microsoft Phi-4
docker model pull ai/deepseek-r1-distill-llama # DeepSeek R1 distilled

Important: Only download models that fit within your available VRAM to avoid system slowdowns. More models are being added regularly to the AI model registry.

Step 4: Test Model Interaction

# List downloaded models
docker model list

# Test with a single prompt
docker model run ai/llama3.2:1B-Q8_0 "Hello, introduce yourself"

# Interactive chat mode
docker model run ai/llama3.2:1B-Q8_0
# Interactive chat mode started. Type '/bye' to exit.
# > Why is water blue?
# Water appears blue because...
# > /bye

# Remove a model if needed
docker model rm ai/llama3.2:1B-Q8_0

Step 5: Monitor GPU Usage (macOS)

On your Mac, you can monitor GPU usage in real-time:

  1. Press Command + Spacebar to open Spotlight
  2. Type “Activity Monitor” and open it
  3. Select the GPU tab to view GPU history
  4. You’ll see GPU processes triggered when making inference requests

Setting Up OpenWebUI

Option 1: Quick Start with Docker Compose (Recommended)

The fastest way to get started is using Docker Compose with the provider feature:

# Save this as docker-compose.yml
services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    ports:
      - "3000:8080"
    environment:
      - OPENAI_API_BASE_URL=http://model-runner.docker.internal/engines/llama.cpp/v1
      - OPENAI_API_KEY=na
      - WEBUI_NAME=Docker Model Runner Interface
    volumes:
      - open-webui:/app/backend/data
    depends_on:
      - llm-runner

  llm-runner:
    provider:
      type: model
      options:
        model: ai/llama3.2:1B-Q8_0

volumes:
  open-webui:

Deploy the stack:

# Save the YAML above as docker-compose.yml then run:
docker-compose up -d

This will:

  • Automatically pull the OpenWebUI image
  • Configure the connection to Docker Model Runner
  • Set up the necessary networking and volumes
  • Start the service on http://localhost:3000

Initial Setup and Access

  1. Access OpenWebUI: Open your browser and navigate to http://localhost:3000
  2. Create Admin Account: The first user you create becomes the admin automatically
  3. Login: Use your credentials to access the interface
  4. Select Model: Choose from the available models in the dropdown menu

Troubleshooting First Access:

  • If no models appear, wait a few moments for the model download to complete
  • Check container logs: docker-compose logs llm-runner
  • Verify models are available: docker model list
  • Check Docker Model Runner status: docker model status dropdown menu

Troubleshooting First Access:

  • If no models appear, wait a few moments for the model download to complete
  • Check container logs: docker-compose logs llm-runner
  • Verify models are available: docker model list

How to Add Models to OpenWebUI

OpenWebUI supports multiple ways to add and manage models. Here’s a comprehensive guide for each method:

Method 1: Adding Models Through the UI

  1. Access the Admin Panel:
    • Navigate to http://localhost:3000
    • Log in with your admin account
    • Click on your profile → Admin Panel
  2. Add External API Models:
    • Go to SettingsConnections
    • Click + Add Connection
    • Configure your model endpoint: Name: My Custom ModelBase URL: http://localhost:12434/v1API Key: (if required)
  3. Import Models from Model Library:
    • Navigate to WorkspaceModels
    • Click + Add Model
    • Browse available models or import from URLs

Conclusion

The combination of OpenWebUI and Docker Model Runner represents the future of local AI development and deployment. Docker Model Runner delivers this by including an inference engine as part of Docker Desktop, built on top of llama.cpp and accessible through the familiar OpenAI API. No extra tools, no extra setup, and no disconnected workflows.

This powerful integration offers several key advantages:

  • Privacy-First: Complete offline operation with local data storage and no external API dependencies
  • Developer-Friendly: Familiar Docker workflows with minimal setup using the modern provider feature
  • Enterprise-Ready: Support for multiple models from the official AI model registry at https://hub.docker.com/u/ai
  • Highly Scalable: From development to production deployments with monitoring and load balancing
  • Cost-Effective: No cloud API costs or usage limits, run models locally on your hardware
  • Modern Architecture: Uses the latest Docker Compose provider feature for automatic model management
  • Performance Optimized: Host-based execution for optimal GPU utilization on supported platforms

The provider-based approach introduced in Docker Desktop 4.40+ revolutionizes model management by automatically handling downloads and lifecycle management through standard Docker Compose workflows. This makes local AI development more accessible and reliable than ever before.

Whether you’re building AI applications, experimenting with different models, or requiring privacy-focused AI solutions, the OpenWebUI and Docker Model Runner combination provides a robust, modern foundation for your local AI infrastructure.

As both technologies continue to evolve, we can expect even tighter integration, expanded platform support, and additional features that will cement Docker Model Runner as the standard for local AI development workflows.

Additional Resources


Have Queries? Join https://launchpass.com/collabnix

Collabnix Team The Collabnix Team is a diverse collective of Docker, Kubernetes, and IoT experts united by a passion for cloud-native technologies. With backgrounds spanning across DevOps, platform engineering, cloud architecture, and container orchestration, our contributors bring together decades of combined experience from various industries and technical domains.

Before and After MCP: The Evolution of AI Tool…

This past weekend, I presented a talk titled “How Docker is revolutionizing the MCP Landscape,” which garnered positive feedback from attendees. During the presentation,...
Collabnix Team
5 min read

Leave a Reply

Collabnixx
Chatbot
Join our Discord Server
Index