Understanding Docker multi-stage builds
Third article in the series of Docker best practices will help you to understand how to make use of multi-stage Docker builds feature. If this is the first article in this series for you, be sure to check out the others. You can find the table of contents below:
- Proper use of cache to speed up and optimize builds.
- Selecting the appropriate base image.
- Understanding Docker multi-stage builds.
- Understanding the context of the build.
- Using administrator privileges.
Many best practices on how to properly prepare Docker images focus their attention on size of images and number of layers. Certainly, the consequences of both of these parameters may result in the rapid exhausting of disk space or long downloading times of base images. This way it is very easy to "pollute" the resulting Docker images.
Not so long ago, since Docker version 17.05, a new mechanism was introduced to keep images "clean" in a quite convenient way. This mechanism is called
multi-stage builds. Please see the exercise below to get the idea of multi-staging and the benefits it brings.
Let's start by preparing a sample application that we want to place in a Docker image. This will be a web application created using the React framework and its create-react-app tool. It will generate a code template and configuration, allowing us to focus on the image creation aspects.
Instead of installing Node.js locally, let's take advantage of the benefits of Docker and spawn a temporary container to generate the skeleton of React the application:
docker run --rm -v $(PWD):/opt -w /opt --entrypoint sh node:18.7.0-alpine -c "npm install create-react-app && npx create-react-app example"
Tip: If you are using Windows, we recommend using WSL2..
The above line will start a container with Node.js 18 based on the lightweight distribution Alpine. The current directory will be mounted under the
/opt path. The default action after container launch will be to install the
create-react-app package and run it using the
Ok, we have the application. Now let’s proceed with the "classic" Dockerfile:
FROM nginx:1.23.1-alpine WORKDIR /opt/example RUN apk add --no-cache --virtual .build-deps \ nodejs \ npm COPY package.json package-lock.json ./ RUN npm ci COPY . . RUN npm run build \ && cp -r build/* /usr/share/nginx/html \ && apk del .build-deps
We use the NGINX base image, which is one of the most popular web servers. We change the current directory to
/opt/example, and then install Node.js and NPM, which are necessary to build our application. We copy the package information and install it with the
npm ci command. The next step is to copy all the code of the application and build it with
npm run build. Then eventually copy the results to the directory where the web server expects them and uninstall Node.js along with NPM.
Has anything caught your attention? You are right, layers! By design Docker images are made up of layers. As a result if Node.js and NPM were installed with a separate
RUN instruction then unfortunately deleting them at the end will not make the image smaller.
Before multi-stage builds were introduced, one solution for this problem was to split the Dockerfile into two parts or copy files previously prepared on the host system into the image. However both solutions are against the idea of moving the application preparation process to an independent environment.
Let's take a look at a Dockerfile which uses a multi-stage builds:
FROM node:18.7.0-alpine AS builder WORKDIR /opt/example COPY package.json package-lock.json ./ RUN npm install COPY . . RUN npm run build FROM nginx:1.23.1-alpine COPY /opt/example/build/* /usr/share/nginx/html/
Isn't this form more readable? What you should pay attention to is the use of more than one
FROM instruction. Each one starts the image creation process from scratch, but we can still use the results of the previous ones thanks to
COPY --from=... instruction. The
from parameter can take the index of the step, its name is given with
AS (in our case it is
builder), or the name of a completely separate image.
Analyzing the Dockerfile step by step you can notice that the image building process starts with selecting the Node.js 18.7.0 image based on the Alpine distribution. We labeled this stage as
builder. In the next step we set the current directory and copied the information about the required packages into it. We installed the packages using
npm ci and copied the application code and ran
npm run build. The next instruction is to start a new stage based on the NGINX server image, this time without any additional name. We copied the files of the prepared application from the previous step to the place required by the server configuration.
Thanks to the multi-step build, the image we built contains exactly what we need: the web server and the generated application files. This approach also brings some form of simplification because we are able to perform the whole process by running a single
docker build command.
Docker multi-stage builds are a feature in Docker that allows creating Docker images using multiple build stages. This splits the build process into multiple steps, each of which can use a different base image and produce a different intermediate image. This is useful because it allows one to take advantage of the benefits of different base images, while also keeping the final image as small and efficient as possible. Overall, using multi-stage builds can help improve the performance and efficiency of Docker images, while also making the build process more flexible and customizable.
If you are interesting why there may be a lot of data transferred during build process, please check the next article in the series: Understanding the context of the build