Daniel Kraszewski
Daniel Kraszewski
Head of Engineering

Selecting the appropriate Docker base image

Nov 16, 20224 min read

This is the second article related to Docker best practices. In this article I will explain how exactly you can choose the best base image. The full list of the topics in the series you can check below:

  1. Proper use of cache to speed up and optimize builds
  2. Selecting the appropriate base image
  3. Understanding Docker multi-stage builds
  4. Understanding the context of the build
  5. Using administrator privileges

Choose base images wisely

When learning Docker, we very quickly come across descriptions of how images are built. A set of layers that, stacked one on top of the other, form the final file system of a running container. Seemingly clear, but what do they give us in practice? First of all, the fact that we can (although we don't have to) use another image as the base of our image, e.g. one available on public registries such as Docker Hub. It can be Ubuntu, CentOS, a Python interpreter or Bash. What matters is what libraries and tools we need to port our project to the Docker environment.

We configure the base image (that's how we call the image that our implementation will be based on) using one of the most commonly used instructions in the Dockerfile - the FROM instruction. Assuming that this time we are Dockerizing an application written in Python, an example of its use is shown below:

FROM python:3.10.6 COPY helloworld.py . CMD [ "python", "./helloworld.py" ]

The first line instructs Docker that we want to use an existing Python image with the 3.10.6 tag as the base image. The 3.10.6 tag as described in repository means using Python version 3.10.6. In the second line of the Dockerfile we copy our sample application, and in the third line we decide that it will be executed when the Docker container starts.

At this point, we already know that we don't need to install Python manually. Someone has prepared the Python image for us. Same as the community prepares all kinds of packages used in software development. The question arises, however, what to look for when selecting a base image?

Who is the author?

docker-official-images.png

Anyone can publish their images on Docker Hub, which comes with risk of including malicious code in their application. To minimize this risk, it is worth checking who the author is in the By section. As an added convenience, you can find trusted images labeled with Docker Official Image and Verified Publisher.

Architecture

architecture.png

Images are built to run in specific environments. Code compiled for the processor of a typical desktop computer will not run directly on a MacBook M1/2 or RaspberryPi. Although it is more advanced knowledge, it is worth checking that the appropriate architecture is available in the list of tags.

Version

version.png

When deciding on a version, it's a good idea to choose a tag that narrows down the possible image content as much as possible. Even upgrading with a patch version, may inconsistently cause backward compatibility problems that we do not expect. Therefore, given a choice of python:3, python:3.10 or python:3.10.6, a reasonable choice would be the latter.

Size

size.png

The size of the image translates directly into the amount of data needed to be sent over the network, as well as disk space. If we need to run a script in Bash, it is not worth using all of Ubuntu for this. A good practice is to select images tailored for specific needs, e.g. using smaller and leaner Linux distributions like Alpine. Such images often have alpine in the tags.

Security

Typical Linux distributions contain many libraries and tools that could potentially bring vulnerabilities. An immediate way to minimize the risk is to use smaller distributions, like the aforementioned Alpine. Fewer dependencies mean simpler monitoring and upgradeability.

Standard library C

tips@u11d:~$ sudo docker run --rm -it alpine:3.16 $ wget -O docker-compose https://github.com/docker/compose/releases/download/v2.7.0/docker-compose-linux-x86_64 Connecting to github.com (140.82.121.3:443) Connecting to objects.githubusercontent.com (185.199.111.133:443) saving to 'docker-compose' docker-compose 100% |*****************************************| 11.6M 0:00:00 ETA 'docker-compose' saved $ chmod +x docker-compose $ ./docker-compose /bin/sh: ./docker-compose: not found

On Unix-like systems, the standard library is treated as part of the operating system. This means that we will not be able to run an application built with glibc in an image based on, for example, the musl library. An example of this would be trying to run the docker-compose utility downloaded directly from GitHub in an image based on Alpine.

Is there anything else you can do better? Yes, such as building your image completely from scratch using FROM scratch and putting only the application and its direct dependencies in it. It is also worth looking at the Distroless initiative striving to achieve the same goal.

Conclusion

Selecting a good Docker base image is important for several reasons. First, the base image forms the foundation of Docker image, and it provides the underlying OS and runtime environment that the application will run in. Therefore, choosing a base image that is well-suited to your needs can help ensure that the application runs efficiently.

Second, the base image can impact the size of the final Docker image. For example, choosing a base image that includes a lot of unnecessary libraries or applications may end up with bloated Docker images. This can affect the performance of applications and make it more difficult to distribute and deploy.

Finally, the base image can also impact the security of a final Docker image. Using a base image that is known to have vulnerabilities results in a fact that Docker image may be more susceptible to attack. Therefore, it's important to choose a base image that is well-maintained and regularly updated with security patches.

If you want to build performant Docker images check out the next article in this series: Understanding Docker multi-stage builds

RELATED POSTS
Tomasz Fidecki
Tomasz Fidecki
Managing Director | Technology

Templating Values in Kustomize: Unlocking the Potential of Dynamic Naming for Kubernetes Resources

Mar 13, 20247 min read
Article image
Tomasz Fidecki
Tomasz Fidecki
Managing Director | Technology

Maximizing Efficiency with Dev Containers: A Developer's Guide

Feb 22, 202419 min read
Article image
Bartłomiej Gałęzowski
Bartłomiej Gałęzowski
Software Engineer

Unleashing the power of serverless: a deep dive into web form deployment with our serverless plugin

Oct 12, 202313 min read
Article image