Join our Discord Server
Adesoji Alu Adesoji brings a proven ability to apply machine learning(ML) and data science techniques to solve real-world problems. He has experience working with a variety of cloud platforms, including AWS, Azure, and Google Cloud Platform. He has a strong skills in software engineering, data science, and machine learning. He is passionate about using technology to make a positive impact on the world.

How I Reduced a Docker Image Size by 90%: A Step-by-Step Journey

2 min read

How I Reduced a Docker Image Size by 90%: A Step-by-Step Journey

Let me take you through my journey of optimizing a Python-based Machine Learning application’s Docker image, reducing it from a hefty 3.09GB to just 280MB. Here’s how I did it, step by step.

The Initial Problem

I started with a typical Dockerfile for a machine learning application that uses TensorFlow:


FROM python:3.9
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt
CMD ["python", "app.py"]

With these requirements:


tensorflow
pandas
numpy
scikit-learn
pillow
flask
gunicorn

Initial image size: 3.09GB 😱

original

Step 1: Analyzing the Base Image

First, I used docker history to understand what was taking up space:


docker history image_name

Key findings:

  • Base python:3.9 image: 934MB
  • TensorFlow and its dependencies: 1.7GB
  • Our application code: ~200MB

Step 2: Switching to a Slim Base Image

Changed from python:3.9 to python:3.9-slim:


FROM python:3.9-slim
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt
CMD ["python", "app.py"]

New size: 1.9GB (Reduction: 32%)

Step 3: Optimizing Dependencies

I noticed we didn’t need the full TensorFlow package. Switched to TensorFlow Lite:


# requirements.txt
tflite-runtime
pandas
numpy
scikit-learn
pillow
flask
gunicorn

New size: 1.2GB (Reduction: 57%)

Step 4: Multi-stage Build

Implemented a multi-stage build to separate build dependencies from runtime:


# Build stage
FROM python:3.9-slim as builder

WORKDIR /app
COPY requirements.txt .

RUN pip install --user -r requirements.txt

# Runtime stage
FROM python:3.9-slim

WORKDIR /app

# Copy only the necessary files from builder
COPY --from=builder /root/.local/lib/python3.9/site-packages /root/.local/lib/python3.9/site-packages
COPY app.py .
COPY models /app/models

ENV PATH=/root/.local/bin:$PATH

CMD ["python", "app.py"]

New size: 850MB (Reduction: 70%)

Step 5: Cleaning Up Package Manager

Added cleanup commands and combined RUN statements:


# Stage 1: Builder
FROM python:3.9-slim AS builder

WORKDIR /app
COPY requirements.txt .

# Install dependencies and remove unnecessary files to reduce image size
RUN pip install --user --no-cache-dir -r requirements.txt && \
    find /root/.local \
        \( -type d -name test -o -name tests \) -prune -o \
        \( -type f -name '*.pyc' -o -name '*.pyo' \) -exec rm -rf '{}' +

# Stage 2: Final Image
FROM python:3.9-slim

WORKDIR /app

# Copy only necessary runtime dependencies
COPY --from=builder /root/.local /root/.local
COPY app.py .

# Install required system dependencies and clean up
RUN apt-get update && \
    apt-get install --no-install-recommends -y libgomp1 && \
    rm -rf /var/lib/apt/lists/*

# Ensure local Python packages are available
ENV PATH="/root/.local/bin:$PATH"

CMD ["python", "app.py"]

New size: 280MB (Total reduction: 90%)

Final Results

Let’s look at the progression:

  • Initial image: 3.09GB
  • Slim base image: 1.9GB
  • Optimized dependencies: 1.2GB
  • Multi-stage build: 850MB
  • Final optimized image: 280MB

Key Takeaways

  • Analyze First: Always use docker history to understand what’s consuming space.
  • Choose the Right Base: Slim variants can significantly reduce size without sacrificing functionality.
  • Optimize Dependencies:
    • Use lighter alternatives when possible
    • Only install what you need
    • Consider using wheels for Python packages
  • Multi-stage Builds: Separate build-time dependencies from runtime needs.
  • Clean Up: Remove unnecessary files and cache after installations.

Bonus Tips

  • Layer Caching: Keep frequently changing files in later layers.
  • .dockerignore: Exclude unnecessary files from the build context.
  • Use BuildKit: Enable Docker BuildKit for more efficient builds:

export DOCKER_BUILDKIT=1

Measuring Impact

Beyond just size reduction, this optimization brought several benefits:

  • 75% faster deployment times
  • Reduced bandwidth costs
  • Improved security with smaller attack surface
  • Faster container startup times

Remember, the goal isn’t just to make images smaller – it’s to find the right balance between size and functionality for your specific use case.

Have Queries? Join https://launchpass.com/collabnix

Adesoji Alu Adesoji brings a proven ability to apply machine learning(ML) and data science techniques to solve real-world problems. He has experience working with a variety of cloud platforms, including AWS, Azure, and Google Cloud Platform. He has a strong skills in software engineering, data science, and machine learning. He is passionate about using technology to make a positive impact on the world.

Snowflake, Model Context Protocol (MCP) Server and Claude Desktop

  TL;DR: We’ll set up and run a Model Context Protocol (MCP) server that communicates with Snowflake to run SQL queries. We’ll install using...
Adesoji Alu
3 min read

Leave a Reply

Join our Discord Server
Index