Join our Discord Server
Collabnix Team The Collabnix Team is a diverse collective of Docker, Kubernetes, and IoT experts united by a passion for cloud-native technologies. With backgrounds spanning across DevOps, platform engineering, cloud architecture, and container orchestration, our contributors bring together decades of combined experience from various industries and technical domains.

Setting Up Ollama Models with Docker Compose: A Step-by-Step Guide

3 min read

Running large language models locally has become much more accessible thanks to projects like Ollama. In this guide, I’ll walk you through how to set up Ollama and run your favorite models using Docker Compose, making deployment and management much simpler.

Why Docker Compose?

While you can run Ollama with a single Docker command, Docker Compose offers several advantages:

  1. Configuration as code: Your entire setup is documented in a YAML file
  2. Easy resource management: Configure memory limits, GPU access, and networking in one place
  3. Service orchestration: Run multiple containers that work together (like adding a web UI)
  4. Simplified commands: Start, stop, and rebuild your setup with simple commands

Prerequisites

Before we begin, make sure you have:

  • Docker installed on your system
  • Docker Compose installed (comes bundled with Docker Desktop on Windows/Mac)
  • A GPU with enough VRAM for your chosen model (optional, but recommended)
  • NVIDIA Container Toolkit installed (if using a GPU)

Basic Docker Compose Setup for Ollama

Let’s start with a basic docker-compose.yml file for running Ollama:


services:
ollama:
image: ollama/ollama:latest
container_name: ollama
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]

volumes:
ollama_data:
name: ollama_data

This configuration:

  • Uses the official Ollama image
  • Maps port 11434 to allow API access
  • Creates a persistent volume for storing models
  • Gives the container access to one GPU

Save this file as docker-compose.yml in a directory of your choice.

Starting Ollama

To start the service:

docker-compose up -d

The -d flag runs the container in detached mode (background). You should see output confirming that the Ollama service has started.

Pulling and Running Models

Now that Ollama is running, you can pull and run models. Let’s first check that everything is working properly:

curl http://localhost:11434/api/tags

This should return an empty list if you haven’t pulled any models yet.

Pulling a Model

There are two ways to pull models:

1. Using the Ollama CLI through Docker

docker exec -it ollama ollama pull mistral

This command:

  • Uses docker exec to run a command inside the running container
  • Runs ollama pull mistral to download the Mistral model

2. Using the REST API

curl -X POST http://localhost:11434/api/pull -d '{"name": "mistral"}'

Both methods achieve the same result, but the first is more straightforward for simple commands.

Testing Your Model

Let’s make sure the model is working:

curl -X POST http://localhost:11434/api/generate -d '{
"model": "mistral",
"prompt": "Explain Docker Compose in one paragraph"
}'

You should get a response from the model with a brief explanation of Docker Compose.

Advanced Configuration

Environment Variables for Parallelism

To enable better performance, especially with multiple models or parallel requests, add environment variables to your Docker Compose file:

services:
ollama:
image: ollama/ollama:latest
# ... other settings ...
environment:
- OLLAMA_NUM_PARALLEL=4
- OLLAMA_MAX_LOADED_MODELS=3

These settings:

  • OLLAMA_NUM_PARALLEL: Controls how many parallel requests each model can handle
  • OLLAMA_MAX_LOADED_MODELS: Limits how many models can be loaded simultaneously

Adding a Web UI

Ollama works great with various UIs. One popular option is ollama-webui. Here’s how to add it to your Docker Compose setup:

services:
ollama:
# ... ollama settings ...

webui:
image: ghcr.io/ollama-webui/ollama-webui:main
container_name: ollama-webui
ports:
- "3000:8080"
depends_on:
- ollama
environment:
- OLLAMA_API_BASE_URL=http://ollama:11434/api
restart: unless-stopped

This adds a web UI accessible at http://localhost:3000.

Creating a Custom Modelfile

One of Ollama’s powerful features is the ability to customize models using Modelfiles. Here’s how to create a custom model with specific parameters and a system prompt:

  1. Create a directory for your Modelfile:
bashCopymkdir -p modelfiles/my-custom-mistral
  1. Create a Modelfile in that directory:
# modelfiles/my-custom-mistral/Modelfile
FROM mistral

# Set parameters
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER top_k 40

# Set a custom system prompt
SYSTEM """You are a helpful AI assistant specialized in explaining technical concepts clearly and concisely.
Always provide practical examples when explaining something.
"""
  1. Build the model using the Modelfile:
docker exec -it ollama ollama create my-mistral -f /path/to/modelfiles/my-custom-mistral/Modelfile

Replace /path/to/ with the actual path on your host system. You’ll need to make this directory available to the container by adding it to the volumes in your Docker Compose file:

services:
ollama:
# ... other settings ...
volumes:
- ollama_data:/root/.ollama
- ./modelfiles:/modelfiles

Full Docker Compose Example

Here’s a complete example incorporating all the features we’ve discussed:



services:
ollama:
image: ollama/ollama:latest
container_name: ollama
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
- ./modelfiles:/modelfiles
environment:
- OLLAMA_NUM_PARALLEL=4
- OLLAMA_MAX_LOADED_MODELS=3
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
restart: unless-stopped

webui:
image: ghcr.io/ollama-webui/ollama-webui:main
container_name: ollama-webui
ports:
- "3000:8080"
depends_on:
- ollama
environment:
- OLLAMA_API_BASE_URL=http://ollama:11434/api
restart: unless-stopped

volumes:
ollama_data:
name: ollama_data

Managing Your Ollama Setup

Here are some useful commands for managing your Ollama Docker Compose setup:

Start services

docker-compose up -d

Stop services

docker-compose down

View logs

docker-compose logs -f

Rebuild and restart services

docker-compose up -d --build

Remove volumes (will delete all models!)

docker-compose down -v

Troubleshooting

GPU Not Detected

If your GPU isn’t being detected:

  1. Ensure the NVIDIA Container Toolkit is properly installed
  2. Check that your GPU drivers are up to date
  3. Verify that nvidia-smi works correctly on your host system
  4. Try adding these environment variables to the Ollama service:
environment:
  - NVIDIA_VISIBLE_DEVICES=all
  - NVIDIA_DRIVER_CAPABILITIES=compute,utility

Memory Issues

If you’re experiencing out-of-memory errors:

  1. Try using a smaller model
  2. Limit the number of loaded models with OLLAMA_MAX_LOADED_MODELS=1
  3. Add memory limits to your container:
deploy:
  resources:
    limits:
      memory: 16G

Conclusion

Docker Compose provides a flexible, maintainable way to run Ollama and manage your models. This approach makes it easy to:

  • Keep your models and configuration persistent
  • Add complementary services like web UIs
  • Configure resource allocation and parallelism
  • Create and use custom model configurations

By following this guide, you should now have a fully functional Ollama setup running in Docker, ready to serve AI models for your applications.

Whether you’re using Ollama for development, testing, or production, this containerized approach provides isolation, portability, and ease of management for your AI infrastructure.

Happy modeling!

Have Queries? Join https://launchpass.com/collabnix

Collabnix Team The Collabnix Team is a diverse collective of Docker, Kubernetes, and IoT experts united by a passion for cloud-native technologies. With backgrounds spanning across DevOps, platform engineering, cloud architecture, and container orchestration, our contributors bring together decades of combined experience from various industries and technical domains.
Collabnixx
Chatbot
Join our Discord Server
Index