Understanding the Docker build context
In the fourth installment of our Docker best practices series, we take a deep dive into the context of the build to understand further details of image building. Be sure you checked the previous articles:
- Proper use of cache to speed up and optimize builds.
- Selecting the appropriate base image.
- Understanding Docker multi-stage builds.
- Understanding the context of the build.
- Using administrator privileges.
The context of the build - or where the megabytes in the console are coming from.
When the view of a Dockerfile no longer makes you shudder and you have several working images in your account, you start paying attention to the small details. Along the way some questions start to arise. Why despite maintaining all of the best practices does the build cache not always work? Why does Docker send hundreds of megabytes before it finally starts building my image? The answers to both these questions comes down to understanding what the build context is and how it works.
The Docker commands you execute in the console are not run directly by the
docker executable file. It only serves the purpose of a user interface that communicates with the Docker daemon running in the background of your computer or server. Communication is usually done using a Unix socket available as a
/var/run/docker.sock file or a TCP protocol on port
2376 with encryption). In the following data coming from the console, you can see how the
docker version command will behave when the Docker daemon is unavailable on the system.
tips@u11d:~$ sudo docker version Client: Version: 20.10.17 API version: 1.41 Go version: go1.17.11 Git commit: 100c701 Built: Mon Jun 6 22:59:14 2022 OS/Arch: linux/arm64 Context: default Experimental: true Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
The image build command is no exception here. By running
docker build . we inform Docker to use the current directory (the dot at the end) as the data source. As a result, what's in a folder will be packaged and uploaded to the Docker daemon, where the instructions from the Dockerfile will be executed. Sounds reasonable right? The data sent to the server is called the build context, and we can see its size in the first lines of the logs. In the following example, it is 104.9MB.
tips@u11d:~$ dd if=/dev/zero of=big_file.bin bs=1M count=100 100+0 records in 100+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 0.0607578 s, 1.7 GB/s tips@u11d:~$ echo " FROM alpine:3.16 COPY . . " > Dockerfile tips@u11d:~$ sudo docker build . Sending build context to Docker daemon 104.9MB Step 1/2 : FROM alpine:3.16 ---> 9b18e9b68314 Step 2/2 : COPY . . ---> 23acc061163e Successfully built 23acc061163e
Docker copies all the existing files and directories into the context, including the hidden ones like
.git. If we use, for example,
COPY . . to conveniently copy the entire context into the image, they will also be included. The hidden files are one of the most common reasons why the build cache works selectively, and we are unable to locate the cause.
Another potential problem is copying compilation results, or installed packages remaining on the host system after testing. This usually leads to transferring large amounts of data to the Docker server (e.g. a huge
node_modules directory), and in the worst case, event developer versions of files.
How to deal with this problem? Certainly, you should carefully copy data to the image, and in addition to the Dockerfile, prepare a
.dockerignore file. This file should be placed directly in the root directory of the build context and contain rules for ignoring what should not be included in the build context. Its syntax resembles the well-known equivalent of
.gitignore from the Git versioning system.
Dockerfile .dockerignore .git .idea .vscode /build /dist /gradle /node_modules /reports
.dockerignore also exclude itself and the Dockerfile? If you don't need these files in the image then by all means yes. Docker leaves that decision up to you.
Finally, let's see what the logs from the earlier example look like when we ignore all files with the
tips@u11d:~$ dd if=/dev/zero of=big_file.bin bs=1M count=100 100+0 records in 100+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 0.0817181 s, 1.3 GB/s tips@u11d:~$ echo " FROM alpine:3.16 COPY . . " > Dockerfile tips@u11d:~$ echo " > *.bin > " > .dockerignore tips@u11d:~$ sudo docker build . Sending build context to Docker daemon 10.75kB Step 1/2 : FROM alpine:3.16 ---> 9b18e9b68314 Step 2/2 : COPY . . ---> 0ef632c6dacf Successfully built 0ef632c6dacf
As you probably already found out, the build context shrank to 10.75kB. This way we have been able to achieve better performance, efficiency and much lower data transfer rates between host and the Docker server.
As we found out, the context of the build process refers to the set of files and directories that are used as input to the build process. This can include source code files, configuration files, and any other files that are needed to create the final Docker image. The build context is the set of input files that are provided to the build system, and it is used to determine which files should be included in the final build output. As a next step, I suggest revisiting the images you have previously created and looking at the image building process from a build context perspective. There is a chance that you may be able to introduce some valuable optimization.
Check out the next article that refers to cybersecurity and applies the principle of least privilege to control the artifact’s ownership. Using administrator privileges