Understanding the Docker build context

Nov 30, 20224 min read

In the fourth installment of our Docker best practices series, we take a deep dive into the context of the build to understand further details of image building. Be sure you checked the previous articles:

Proper use of cache to speed up and optimize builds.

Selecting the appropriate base image.

Understanding Docker multi-stage builds.

Understanding the context of the build.

Using administrator privileges.

The context of the build - or where the megabytes in the console are coming from.

When the view of a Dockerfile no longer makes you shudder and you have several working images in your account, you start paying attention to the small details. Along the way some questions start to arise. Why despite maintaining all of the best practices does the build cache not always work? Why does Docker send hundreds of megabytes before it finally starts building my image? The answers to both these questions comes down to understanding what the build context is and how it works.

The Docker commands you execute in the console are not run directly by the docker executable file. It only serves the purpose of a user interface that communicates with the Docker daemon running in the background of your computer or server. Communication is usually done using a Unix socket available as a /var/run/docker.sock file or a TCP protocol on port 2375 (2376 with encryption). In the following data coming from the console, you can see how the docker version command will behave when the Docker daemon is unavailable on the system.

tips@u11d:~$ sudo docker version
Client:
 Version:           20.10.17
 API version:       1.41
 Go version:        go1.17.11
 Git commit:        100c701
 Built:             Mon Jun  6 22:59:14 2022
 OS/Arch:           linux/arm64
 Context:           default
 Experimental:      true
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

The image build command is no exception here. By running docker build . we inform Docker to use the current directory (the dot at the end) as the data source. As a result, what's in a folder will be packaged and uploaded to the Docker daemon, where the instructions from the Dockerfile will be executed. Sounds reasonable right? The data sent to the server is called the build context, and we can see its size in the first lines of the logs. In the following example, it is 104.9MB.

tips@u11d:~$ dd if=/dev/zero of=big_file.bin bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.0607578 s, 1.7 GB/s

tips@u11d:~$ echo "
FROM alpine:3.16
COPY . .
" > Dockerfile

tips@u11d:~$ sudo docker build .
Sending build context to Docker daemon  104.9MB
Step 1/2 : FROM alpine:3.16
 ---> 9b18e9b68314
Step 2/2 : COPY . .
 ---> 23acc061163e
Successfully built 23acc061163e

Docker copies all the existing files and directories into the context, including the hidden ones like .git. If we use, for example, COPY . . to conveniently copy the entire context into the image, they will also be included. The hidden files are one of the most common reasons why the build cache works selectively, and we are unable to locate the cause.

Another potential problem is copying compilation results, or installed packages remaining on the host system after testing. This usually leads to transferring large amounts of data to the Docker server (e.g. a huge node_modules directory), and in the worst case, event developer versions of files.

How to deal with this problem? Certainly, you should carefully copy data to the image, and in addition to the Dockerfile, prepare a .dockerignore file. This file should be placed directly in the root directory of the build context and contain rules for ignoring what should not be included in the build context. Its syntax resembles the well-known equivalent of .gitignore from the Git versioning system.

Dockerfile
.dockerignore
.git
.idea
.vscode

/build
/dist
/gradle
/node_modules
/reports

Should .dockerignore also exclude itself and the Dockerfile? If you don't need these files in the image then by all means yes. Docker leaves that decision up to you.

Finally, let's see what the logs from the earlier example look like when we ignore all files with the .bin extension.

tips@u11d:~$ dd if=/dev/zero of=big_file.bin bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.0817181 s, 1.3 GB/s

tips@u11d:~$ echo "
FROM alpine:3.16
COPY . .
" > Dockerfile

tips@u11d:~$ echo "
> *.bin
> " > .dockerignore

tips@u11d:~$ sudo docker build .
Sending build context to Docker daemon  10.75kB
Step 1/2 : FROM alpine:3.16
 ---> 9b18e9b68314
Step 2/2 : COPY . .
 ---> 0ef632c6dacf
Successfully built 0ef632c6dacf

As you probably already found out, the build context shrank to 10.75kB. This way we have been able to achieve better performance, efficiency and much lower data transfer rates between host and the Docker server.

Summary

As we found out, the context of the build process refers to the set of files and directories that are used as input to the build process. This can include source code files, configuration files, and any other files that are needed to create the final Docker image. The build context is the set of input files that are provided to the build system, and it is used to determine which files should be included in the final build output. As a next step, I suggest revisiting the images you have previously created and looking at the image building process from a build context perspective. There is a chance that you may be able to introduce some valuable optimization.

Check out the next article that refers to cybersecurity and applies the principle of least privilege to control the artifact’s ownership. Using administrator privileges