Docker Build Cache Explained

Table of Contents

One of the key features of Docker is the ability to build images from a set of instructions in a Dockerfile. However, building images can be time-consuming and resource-intensive, especially if you have to repeat the same steps over and over again. That’s where the Docker build cache comes in handy.

The Docker build cache is a mechanism that allows Docker to reuse existing layers from previous builds when building the same image multiple times. This can significantly reduce the build time and save disk space and bandwidth. The cache works by comparing the instructions in the Dockerfile and the files copied or added to the image with the ones from previous builds. If there is a match, Docker will use the cached layer instead of executing the instruction.

In this article, we will explain how the Docker build cache works in detail and show you some examples of how to use it effectively. We will also cover some scenarios where the cache may not work as expected or cause problems, and how to avoid or fix them. By the end of this article, you will have a better understanding of the Docker build cache and how to optimize your image builds.

How Docker Build Cache Works

To understand how the Docker build cache works, we need to first understand how Docker images are built. A Docker image is a collection of layers, each representing a change in the file system of the container. Each layer is identified by a unique hash and can be shared by multiple images.

Each layer in a Docker image corresponds to an instruction in the Dockerfile, which is a text file that contains the commands to build the image. For example, a simple Apache Docker application:

Create a new directory, and change to the directory:

mkdir myapp
cd myapp

Next, create a Dockerfile that defines the base image, the working directory, the commands to update the libraries and install Apache, the port to expose and start the Apache server:

# Use the official Ubuntu image as the base layer
FROM ubuntu

# Set the working directory
WORKDIR /app

# Update the libraries and install Apache
RUN apt update && apt install -y apache2

# Expose the port 80
EXPOSE 80

# Start the Apache server
CMD ["apache2ctl", "-D", "FOREGROUND"]

Here is a quick breakdown of the workflow of this Dockerfile:

The first line FROM ubuntu specifies the base image from which you are building. In this case, you are using the official Ubuntu image from the Docker Hub.
The second line WORKDIR /app sets the working directory for the subsequent instructions. This means that any commands or files will be executed or copied relative to this directory.
The third line RUN apt update && apt install -y apache2 executes a shell command to update the libraries and install Apache on the image. The RUN instruction creates a new layer for each command.
The fourth line EXPOSE 80 informs Docker that the container listens on port 80. This does not actually publish the port, but it is used for documentation and inter-container communication. To publish the port, you need to use the -p flag when running the container.
The fifth line CMD [“apache2ctl”, “-D”, “FOREGROUND”] specifies the default command to run when the container starts. The CMD instruction can be overridden by providing a different command when running the container. In this case, you are using the apache2ctl command to start the Apache server in the foreground.

Using docker build command, build the Docker image from the Dockerfile:

Docker buildx build -t myapp:1.0 .

Verify you get a similar output:

Then run the Docker image as a container and map to port 80:

docker run -d -p 80:80 myapp:1.0

Using a browser such as chrome, visit http://localhost:80 to see the Apache default page.

When we run the docker build command to build this image, Docker will execute each instruction in the Dockerfile and create a new layer for each one. For example, the first instruction FROM ubuntu will pull the Ubuntu image from the Docker Hub and create a layer with the hash sha256:7e0aa2d69a15. The second instruction WORKDIR /app will create a new layer with the hash sha256:8b9e3a4f529e, and so on.

Now, suppose we want to build the same image again, but with some minor changes in the index.html file. If we run the docker build command again, Docker will not start from scratch but instead use the cache to reuse the existing layers that have not changed. Docker will compare the instructions in the Dockerfile and the files copied or added to the image with the ones from the previous build. If there is a match, Docker will use the cached layer instead of executing the instruction.

For example, in our case, the first three instructions (FROM, WORKDIR, and RUN apt update && apt install -y apache2) will not change, so Docker will use the cache for those layers. However, the fourth instruction (COPY index.html /var/www/html/) will change because we have modified the index.html file. Therefore, Docker will execute that instruction and create a new layer with a different hash. The last two instructions (EXPOSE and CMD) will not change, but Docker will not use the cache for those layers because they depend on the previous layer that has changed. Therefore, Docker will execute those instructions and create new layers for them as well.

By using the cache, Docker can speed up the image build process and save disk space and bandwidth. However, the cache is not always reliable or desirable, as we will see in the next section.

When Not to Use the Cache

While the Docker build cache can be very useful, there are some situations where it may not work as expected or even cause problems. In this section, we will discuss some of these scenarios and how to deal with them.

One common scenario where the cache may prevent the image from being updated is when the instruction involves installing packages or software from external sources. For example, suppose we have the following instructions in our Dockerfile:

RUN apt-get update && apt-get install -y curl

This instruction will install the latest version of curl from the Ubuntu repositories. However, if we run the docker build command again, Docker will use the cache for this layer, even if there is a newer version of curl available. This may cause security or compatibility issues if we rely on the latest version of curl.

To avoid this problem, we can use the --no-cache flag when running the docker build command. This will force Docker to ignore the cache and execute all the instructions in the Dockerfile. For example:

docker build --no-cache -t myapp .

Alternatively, we can also invalidate the cache for a specific instruction by adding a comment with a random string or a timestamp. For example:

# Invalidate cache: 2024-02-20
RUN apt-get update && apt-get install -y curl

This will make Docker think that the instruction has changed and execute it again.

Another scenario where the cache may cause unexpected results is when the instruction involves cloning a repository or copying files from a URL. For example, suppose we have the following instruction in our Dockerfile:

RUN git clone https://github.com/jilson/kube-jet.git

This instruction will clone the latest version of kube-jet from GitHub. However, if we run the docker build command again, Docker will use the cache for this layer, even if the repository has been updated. This can cause inconsistencies or errors if we depend on the latest version of kube-jet.

To avoid this problem, we can use the same methods as before: use the –no-cache flag or invalidate the cache with a comment. Alternatively, we can also use the ADD instruction instead of the RUN instruction to copy files from a URL. The ADD instruction will automatically invalidate the cache if the URL has changed. For example:

ADD https://github.com/jilson/kube-jet/archive/master.zip /app

This will download and extract the master branch of Kube-jet to the /app directory. If the URL has changed, Docker will execute this instruction again and update the layer.

As you can see, the Docker build cache is a powerful feature that can help us speed up and optimize our image builds. However, it is not always reliable or desirable, and we need to be aware of its limitations and pitfalls. By using the methods we discussed in this section, we can control the cache and ensure that our images are always updated and consistent.

Conclusion

In this article, we have learned about the Docker build cache, a feature that allows Docker to reuse existing layers from previous builds when building the same image multiple times. We have seen how the cache works, how it can speed up the image build process and save disk space and bandwidth, and how it can also cause problems or prevent the image from being updated. We have also learned how to control the cache and avoid or fix its pitfalls.

The Docker build cache is a powerful tool that can help us optimize our image builds, but it also requires some care and attention. Here are some tips and best practices for using the cache effectively:

Order the layers in the Dockerfile from the least to the most frequently changed ones. This will increase the chances of reusing the cache and reduce the number of layers that need to be rebuilt.
Use multi-stage builds to separate the build and runtime stages of the image. This will reduce the size of the final image and the number of layers that need to be cached.
Use BuildKit, a new and improved tool for building Docker images. BuildKit has many features that can enhance the cache, such as parallel builds, smarter layer reuse, and cache mounts.
Monitor the cache size and usage, and prune the cache regularly. This will free up disk space and remove unused or outdated layers.

Docker Build Cache Explained

How Docker Build Cache Works

When Not to Use the Cache

Conclusion

Resources

Docker DevTools Day 4.0 at Sony India: Where Containers,…

How Modern Technologies Are Revolutionizing the Educational Process

Using Docker in DevOps for Continuous Delivery Success