Running Ollama with Docker for Python Applications

Table of Contents

As AI and large language models become increasingly popular, many developers are looking to integrate these powerful tools into their Python applications. Ollama, a framework for running large language models locally, has gained traction for its simplicity and flexibility. However, when it comes to containerizing applications that use Ollama, developers often encounter challenges. In this blog post, I’ll explore practical approaches to running Ollama in Docker containers alongside your Python applications.

The Challenge with Containerizing Ollama

If you’ve tried to containerize a Python application that uses Ollama, you might have encountered these common issues:

The “waiting for server” problem: When you run ollama serve in a Dockerfile, it blocks the container from proceeding with subsequent commands.
Service startup timing: The Ollama service needs to be fully initialized before your application can interact with it.
Model availability: Models need to be pulled or created before they can be used by your application.

These challenges often lead developers to question whether containerization is even viable for Ollama-based applications or if they should resort to traditional VMs.

Solution Approaches

Let’s explore several practical solutions to running Ollama with Docker for Python applications that use frameworks like LangChain.

1. Using a Startup Script

One of the most effective approaches is to use a startup script that launches Ollama in the background before starting your Python application.

Create a start_services.sh file:

#!/bin/sh

# Start Ollama in the background
ollama serve &

# Wait for Ollama to start
sleep 5

# Pull the required model(s)
ollama pull mistral

# Start your Python application
python app.py

Then in your Dockerfile:

dockerfileCopyFROM ollama/ollama:latest

WORKDIR /app

# Copy your Python application files
COPY . .

# Install Python and dependencies
RUN apt-get update && apt-get install -y python3 python3-pip
RUN pip install -r requirements.txt

# Make the startup script executable
COPY start_services.sh .
RUN chmod +x start_services.sh

# Expose the Ollama API port
EXPOSE 11434

# Run the startup script
CMD ["./start_services.sh"]

This approach ensures that Ollama is running in the background before your Python application starts.

2. Using a One-Liner in Dockerfile

If you prefer a more compact solution without a separate script, you can use a command sequence in your Dockerfile’s CMD instruction:

FROM ollama/ollama:latest

WORKDIR /app

# Copy your Python application files
COPY . .

# Install Python and dependencies
RUN apt-get update && apt-get install -y python3 python3-pip
RUN pip install -r requirements.txt

# Expose the Ollama API port
EXPOSE 11434

# Start Ollama, wait for it to initialize, pull the model, then run your app
CMD ollama serve & sleep 5 && ollama pull mistral && python app.py

This one-liner uses the & operator to run Ollama in the background, waits for it to initialize, pulls the necessary model, and then starts your Python application.

3. Using Docker Compose for a Multi-Container Setup

A more robust approach is to use Docker Compose to run Ollama and your Python application as separate containers:


services:
  ollama:
    image: ollama/ollama:latest
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
      interval: 10s
      timeout: 5s
      retries: 5

  app:
    build: .
    depends_on:
      ollama:
        condition: service_healthy
    environment:
      - OLLAMA_HOST=http://ollama:11434

volumes:
  ollama_data:

Your Dockerfile for the app service would be simpler, as it only needs to focus on your Python application:

FROM python:3.9-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .

CMD ["python", "app.py"]

And in your Python application, you’d configure LangChain to connect to the Ollama service:

from langchain.llms import Ollama

def get_ollama_client():
    import os
    ollama_host = os.environ.get("OLLAMA_HOST", "http://localhost:11434")
    return Ollama(base_url=ollama_host, model="mistral")

llm = get_ollama_client()
result = llm.predict("Explain Docker in one sentence.")
print(result)

Pre-pulling Models for Production Use

For production environments, it’s often beneficial to pre-pull models in your Dockerfile to avoid runtime delays:

# For the Ollama container in a multi-container setup
FROM ollama/ollama:latest

# Pre-pull models during image build
RUN ollama serve & \
    sleep 5 && \
    ollama pull mistral && \
    ollama pull llama2 && \
    pkill ollama

EXPOSE 11434

CMD ["ollama", "serve"]

This approach ensures the models are already available when your container starts.

Architectural Considerations

While containerization offers many benefits, it’s worth considering whether it’s the best approach for your specific use case:

When to Use Containers with Ollama

Development and testing environments
CI/CD pipelines
Consistent deployment across multiple environments
Microservices architecture where the Ollama service needs to be scaled independently

When to Consider VMs or Bare Metal

Production environments with stable, long-running Ollama instances
When you need to leverage specific GPU optimizations
High-performance requirements where containerization overhead is a concern

As one developer noted in the Ollama community:

“From an architectural perspective, I suggest installing and configuring Ollama as a standalone service on a VM or bare-metal server. This setup can be managed through systemctl status ollama on Linux systems.”

This approach offers:

Simplicity in managing Ollama as a service
Direct access to hardware acceleration
Potentially better performance for long-running services

Optimizing for Serverless Platforms

If you’re targeting serverless platforms like Google Cloud Run (GCR), you’ll need to adapt your approach:

Cold Start Optimization: Pre-pull smaller models to reduce cold start times
Memory and CPU Allocation: Ensure you allocate sufficient resources for both Ollama and your application
Timeout Considerations: Configure longer timeouts for operations that involve model loading
Connection Health Checks: Implement health checks to ensure Ollama is running before accepting requests

Here’s a sample Dockerfile optimized for serverless:

FROM ollama/ollama:latest

WORKDIR /app

# Install Python and dependencies
RUN apt-get update && apt-get install -y python3 python3-pip curl
COPY requirements.txt .
RUN pip install -r requirements.txt

# Copy application code
COPY . .

# Set up healthcheck endpoint
RUN pip install flask
COPY healthcheck.py .

# Prepare startup script
COPY start_cloud_run.sh .
RUN chmod +x start_cloud_run.sh

# Expose port for the application
EXPOSE 8080

# Start Ollama and the application
CMD ["./start_cloud_run.sh"]

And a corresponding start_cloud_run.sh:

#!/bin/sh

# Start Ollama in the background
ollama serve &

# Wait for Ollama to start
echo "Waiting for Ollama to start..."
timeout 60 bash -c 'until curl -s http://localhost:11434/api/tags > /dev/null 2>&1; do sleep 1; done'

# Pull a lightweight model (reduce cold start time)
echo "Pulling model..."
ollama pull tinyllama

# Start the Flask application with health checks
echo "Starting application..."
python app.py

Conclusion

Containerizing Python applications that use Ollama presents unique challenges, but with the right approach, it’s entirely feasible. Whether you choose a single-container solution with a startup script, a multi-container architecture with Docker Compose, or opt for VMs in production, understanding these patterns will help you deploy your LLM-powered applications with confidence.

The key factors to consider are:

Ensuring proper service initialization
Managing model availability
Configuring appropriate resources
Choosing the right deployment architecture for your specific needs

By addressing these considerations, you can successfully deploy Ollama-based applications in containerized environments and take advantage of the flexibility and portability that Docker offers.

Have you successfully deployed Ollama with Docker in your projects? What approaches worked best for you? Share your experiences in the comments below!

Using Ollama in Production: A Developer’s Practical Guide

How to setup Ollama with Ollama-WebUI using Docker Compose