Running large language models locally has become much more accessible thanks to projects like Ollama. In this guide, I’ll walk you through how to set up Ollama and run your favorite models using Docker Compose, making deployment and management much simpler.
Why Docker Compose?
While you can run Ollama with a single Docker command, Docker Compose offers several advantages:
- Configuration as code: Your entire setup is documented in a YAML file
- Easy resource management: Configure memory limits, GPU access, and networking in one place
- Service orchestration: Run multiple containers that work together (like adding a web UI)
- Simplified commands: Start, stop, and rebuild your setup with simple commands
Prerequisites
Before we begin, make sure you have:
- Docker installed on your system
- Docker Compose installed (comes bundled with Docker Desktop on Windows/Mac)
- A GPU with enough VRAM for your chosen model (optional, but recommended)
- NVIDIA Container Toolkit installed (if using a GPU)
Basic Docker Compose Setup for Ollama
Let’s start with a basic docker-compose.yml
file for running Ollama:
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
volumes:
ollama_data:
name: ollama_data
This configuration:
- Uses the official Ollama image
- Maps port 11434 to allow API access
- Creates a persistent volume for storing models
- Gives the container access to one GPU
Save this file as docker-compose.yml
in a directory of your choice.
Starting Ollama
To start the service:
docker-compose up -d
The -d
flag runs the container in detached mode (background). You should see output confirming that the Ollama service has started.
Pulling and Running Models
Now that Ollama is running, you can pull and run models. Let’s first check that everything is working properly:
curl http://localhost:11434/api/tags
This should return an empty list if you haven’t pulled any models yet.
Pulling a Model
There are two ways to pull models:
1. Using the Ollama CLI through Docker
docker exec -it ollama ollama pull mistral
This command:
- Uses
docker exec
to run a command inside the running container - Runs
ollama pull mistral
to download the Mistral model
2. Using the REST API
curl -X POST http://localhost:11434/api/pull -d '{"name": "mistral"}'
Both methods achieve the same result, but the first is more straightforward for simple commands.
Testing Your Model
Let’s make sure the model is working:
curl -X POST http://localhost:11434/api/generate -d '{
"model": "mistral",
"prompt": "Explain Docker Compose in one paragraph"
}'
You should get a response from the model with a brief explanation of Docker Compose.
Advanced Configuration
Environment Variables for Parallelism
To enable better performance, especially with multiple models or parallel requests, add environment variables to your Docker Compose file:
services:
ollama:
image: ollama/ollama:latest
# ... other settings ...
environment:
- OLLAMA_NUM_PARALLEL=4
- OLLAMA_MAX_LOADED_MODELS=3
These settings:
OLLAMA_NUM_PARALLEL
: Controls how many parallel requests each model can handleOLLAMA_MAX_LOADED_MODELS
: Limits how many models can be loaded simultaneously
Adding a Web UI
Ollama works great with various UIs. One popular option is ollama-webui. Here’s how to add it to your Docker Compose setup:
services:
ollama:
# ... ollama settings ...
webui:
image: ghcr.io/ollama-webui/ollama-webui:main
container_name: ollama-webui
ports:
- "3000:8080"
depends_on:
- ollama
environment:
- OLLAMA_API_BASE_URL=http://ollama:11434/api
restart: unless-stopped
This adds a web UI accessible at http://localhost:3000.
Creating a Custom Modelfile
One of Ollama’s powerful features is the ability to customize models using Modelfiles. Here’s how to create a custom model with specific parameters and a system prompt:
- Create a directory for your Modelfile:
bashCopymkdir -p modelfiles/my-custom-mistral
- Create a Modelfile in that directory:
# modelfiles/my-custom-mistral/Modelfile
FROM mistral
# Set parameters
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER top_k 40
# Set a custom system prompt
SYSTEM """You are a helpful AI assistant specialized in explaining technical concepts clearly and concisely.
Always provide practical examples when explaining something.
"""
- Build the model using the Modelfile:
docker exec -it ollama ollama create my-mistral -f /path/to/modelfiles/my-custom-mistral/Modelfile
Replace /path/to/
with the actual path on your host system. You’ll need to make this directory available to the container by adding it to the volumes in your Docker Compose file:
services:
ollama:
# ... other settings ...
volumes:
- ollama_data:/root/.ollama
- ./modelfiles:/modelfiles
Full Docker Compose Example
Here’s a complete example incorporating all the features we’ve discussed:
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
- ./modelfiles:/modelfiles
environment:
- OLLAMA_NUM_PARALLEL=4
- OLLAMA_MAX_LOADED_MODELS=3
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
restart: unless-stopped
webui:
image: ghcr.io/ollama-webui/ollama-webui:main
container_name: ollama-webui
ports:
- "3000:8080"
depends_on:
- ollama
environment:
- OLLAMA_API_BASE_URL=http://ollama:11434/api
restart: unless-stopped
volumes:
ollama_data:
name: ollama_data
Managing Your Ollama Setup
Here are some useful commands for managing your Ollama Docker Compose setup:
Start services
docker-compose up -d
Stop services
docker-compose down
View logs
docker-compose logs -f
Rebuild and restart services
docker-compose up -d --build
Remove volumes (will delete all models!)
docker-compose down -v
Troubleshooting
GPU Not Detected
If your GPU isn’t being detected:
- Ensure the NVIDIA Container Toolkit is properly installed
- Check that your GPU drivers are up to date
- Verify that
nvidia-smi
works correctly on your host system - Try adding these environment variables to the Ollama service:
environment:
- NVIDIA_VISIBLE_DEVICES=all
- NVIDIA_DRIVER_CAPABILITIES=compute,utility
Memory Issues
If you’re experiencing out-of-memory errors:
- Try using a smaller model
- Limit the number of loaded models with
OLLAMA_MAX_LOADED_MODELS=1
- Add memory limits to your container:
deploy:
resources:
limits:
memory: 16G
Conclusion
Docker Compose provides a flexible, maintainable way to run Ollama and manage your models. This approach makes it easy to:
- Keep your models and configuration persistent
- Add complementary services like web UIs
- Configure resource allocation and parallelism
- Create and use custom model configurations
By following this guide, you should now have a fully functional Ollama setup running in Docker, ready to serve AI models for your applications.
Whether you’re using Ollama for development, testing, or production, this containerized approach provides isolation, portability, and ease of management for your AI infrastructure.
Happy modeling!