Setting Up Ollama Models with Docker Compose: A Step-by-Step Guide

Running large language models locally has become much more accessible thanks to projects like Ollama. In this guide, I’ll walk you through how to set up Ollama and run your favorite models using Docker Compose, making deployment and management much simpler.

Why Docker Compose?

While you can run Ollama with a single Docker command, Docker Compose offers several advantages:

Configuration as code: Your entire setup is documented in a YAML file
Easy resource management: Configure memory limits, GPU access, and networking in one place
Service orchestration: Run multiple containers that work together (like adding a web UI)
Simplified commands: Start, stop, and rebuild your setup with simple commands

Prerequisites

Before we begin, make sure you have:

Docker installed on your system
Docker Compose installed (comes bundled with Docker Desktop on Windows/Mac)
A GPU with enough VRAM for your chosen model (optional, but recommended)
NVIDIA Container Toolkit installed (if using a GPU)

Basic Docker Compose Setup for Ollama

Let’s start with a basic docker-compose.yml file for running Ollama:


services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

volumes:
  ollama_data:
    name: ollama_data

This configuration:

Uses the official Ollama image
Maps port 11434 to allow API access
Creates a persistent volume for storing models
Gives the container access to one GPU

Save this file as docker-compose.yml in a directory of your choice.

Starting Ollama

To start the service:

docker-compose up -d

The -d flag runs the container in detached mode (background). You should see output confirming that the Ollama service has started.

Pulling and Running Models

Now that Ollama is running, you can pull and run models. Let’s first check that everything is working properly:

curl http://localhost:11434/api/tags

This should return an empty list if you haven’t pulled any models yet.

Pulling a Model

There are two ways to pull models:

1. Using the Ollama CLI through Docker

docker exec -it ollama ollama pull mistral

This command:

Uses docker exec to run a command inside the running container
Runs ollama pull mistral to download the Mistral model

2. Using the REST API

curl -X POST http://localhost:11434/api/pull -d '{"name": "mistral"}'

Both methods achieve the same result, but the first is more straightforward for simple commands.

Testing Your Model

Let’s make sure the model is working:

curl -X POST http://localhost:11434/api/generate -d '{
  "model": "mistral",
  "prompt": "Explain Docker Compose in one paragraph"
}'

You should get a response from the model with a brief explanation of Docker Compose.

Advanced Configuration

Environment Variables for Parallelism

To enable better performance, especially with multiple models or parallel requests, add environment variables to your Docker Compose file:

services:
  ollama:
    image: ollama/ollama:latest
    # ... other settings ...
    environment:
      - OLLAMA_NUM_PARALLEL=4
      - OLLAMA_MAX_LOADED_MODELS=3

These settings:

OLLAMA_NUM_PARALLEL: Controls how many parallel requests each model can handle
OLLAMA_MAX_LOADED_MODELS: Limits how many models can be loaded simultaneously

Adding a Web UI

Ollama works great with various UIs. One popular option is ollama-webui. Here’s how to add it to your Docker Compose setup:

services:
  ollama:
    # ... ollama settings ...
  
  webui:
    image: ghcr.io/ollama-webui/ollama-webui:main
    container_name: ollama-webui
    ports:
      - "3000:8080"
    depends_on:
      - ollama
    environment:
      - OLLAMA_API_BASE_URL=http://ollama:11434/api
    restart: unless-stopped

This adds a web UI accessible at http://localhost:3000.

Creating a Custom Modelfile

One of Ollama’s powerful features is the ability to customize models using Modelfiles. Here’s how to create a custom model with specific parameters and a system prompt:

Create a directory for your Modelfile:

bashCopymkdir -p modelfiles/my-custom-mistral

Create a Modelfile in that directory:

# modelfiles/my-custom-mistral/Modelfile
FROM mistral

# Set parameters
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER top_k 40

# Set a custom system prompt
SYSTEM """You are a helpful AI assistant specialized in explaining technical concepts clearly and concisely.
Always provide practical examples when explaining something.
"""

Build the model using the Modelfile:

docker exec -it ollama ollama create my-mistral -f /path/to/modelfiles/my-custom-mistral/Modelfile

Replace /path/to/ with the actual path on your host system. You’ll need to make this directory available to the container by adding it to the volumes in your Docker Compose file:

services:
  ollama:
    # ... other settings ...
    volumes:
      - ollama_data:/root/.ollama
      - ./modelfiles:/modelfiles

Full Docker Compose Example

Here’s a complete example incorporating all the features we’ve discussed:



services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
      - ./modelfiles:/modelfiles
    environment:
      - OLLAMA_NUM_PARALLEL=4
      - OLLAMA_MAX_LOADED_MODELS=3
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    restart: unless-stopped

  webui:
    image: ghcr.io/ollama-webui/ollama-webui:main
    container_name: ollama-webui
    ports:
      - "3000:8080"
    depends_on:
      - ollama
    environment:
      - OLLAMA_API_BASE_URL=http://ollama:11434/api
    restart: unless-stopped

volumes:
  ollama_data:
    name: ollama_data

Managing Your Ollama Setup

Here are some useful commands for managing your Ollama Docker Compose setup:

Start services

docker-compose up -d

Stop services

docker-compose down

View logs

docker-compose logs -f

Rebuild and restart services

docker-compose up -d --build

Remove volumes (will delete all models!)

docker-compose down -v

Troubleshooting

GPU Not Detected

If your GPU isn’t being detected:

Ensure the NVIDIA Container Toolkit is properly installed
Check that your GPU drivers are up to date
Verify that nvidia-smi works correctly on your host system
Try adding these environment variables to the Ollama service:

environment:
  - NVIDIA_VISIBLE_DEVICES=all
  - NVIDIA_DRIVER_CAPABILITIES=compute,utility

Memory Issues

If you’re experiencing out-of-memory errors:

Try using a smaller model
Limit the number of loaded models with OLLAMA_MAX_LOADED_MODELS=1
Add memory limits to your container:

deploy:
  resources:
    limits:
      memory: 16G

Conclusion

Docker Compose provides a flexible, maintainable way to run Ollama and manage your models. This approach makes it easy to:

Keep your models and configuration persistent
Add complementary services like web UIs
Configure resource allocation and parallelism
Create and use custom model configurations

By following this guide, you should now have a fully functional Ollama setup running in Docker, ready to serve AI models for your applications.

Whether you’re using Ollama for development, testing, or production, this containerized approach provides isolation, portability, and ease of management for your AI infrastructure.

Happy modeling!