What are base images?
Every application running inside a container is built upon a foundation. This foundation is called the base image and supports everything above it. A virtual machine requires an operating system that is used by the application running inside it, and in a similar way a containerized application requires a base image.
The choice of base image is very important and the decision needs to be made with due care and attention. A good choice of base image gives your application room to grow and change as your requirements grow and change. A bad choice of base image will restrict your application and can result in costly rewriting and refactoring.
A base image is often a minimized installation of a regular server operating system, like Debian, Ubuntu or CentOS. Unused or unnecessary components are removed or not installed. This leads to an image that has minimal support requirements, a small attack surface, and is easy to test and validate. It can then be used to create intermediate images to support particular software ecosystems, for example Java.
Why are base images important?
The base image is important as it sets the direction in which a containerized application can grow. The base image also affects the quality of what is built on top of it. Once the choice of base image has been made, it can be difficult and costly to change. This is similar to changing the operating system of a virtual machine – technically possible, but hard to do without significant development and testing effort.
When your base image is a cut-down version of a general-purpose server operating system, then you have the advantage of access to the complete packaging ecosystem of that operating system. The base image from Debian allows access to a pool of over sixty five thousand software packages and support from a large community. Other operating systems may have different levels of package availability and support, and size of developer ecosystem at both a community and commercial level.
Security is, of course, a paramount concern. What is the time delay between security vulnerabilities being announced and when fixes become available? The responsibility of keeping up to date with security releases is something that the base image is in a prime position to be able to do. This allows users of the base image to concentrate on domain-specific problems and allow a centralised team to handle security updates for software infrastructure that is shared amongst multiple development teams.
What happens is the base image is found to be lacking in support or security? This is often found to be the case after significant development activity has already taken place. Changing a base image is similar to changing operating system. It’s a task that is theoretically simple, but can end up being a lot of work with few visible results. This work is often pure refactoring work, that is no extra features are added to the system. Large refactoring stories can be difficult to justify up the management chain which can lead to the work being put off, exacerbating the original problem. The choice of base image is vital in giving your application room to grow in both expected and unexpected directions.
What base images are available?
There are several choices available for a base image.
There are pluses and minus to all base images, but purely from a popularity standpoint, Ubuntu and CentOS are out of the running. Of the remaining choices it’s interesting to note that Alpine Linux and busybox are usually chosen for functional reasons.
Both Alpine and busybox are popular in the community and are used as base images for most of the official Docker library images. Busybox is used for those who prefer size over all other considerations. Alpine is used by developers who prioritise image size over features but also want access to a packaging ecosystem. Alpine vs busybox is a tradeoff for applications that don’t require very much userspace support and are relatively separate from the Linux environment.
Both the Alpine Linux and Debian have established packaging ecosystems which allow users to install additional software on top of the base image. Often this is software from the same upstream source, but there is a difference in the scale and value added by packaging, as well as the larger processes around release, testing and updates. In general, Debian has existed for a much longer time than Alpine Linux and is more mature in a number of ways. Alpine Linux is a fork of an embedded Linux distribution is designed to be small and have low overhead, but still allow larger packages to be installed.
Which base image should you use?
I recommend using either the Alpine Linux or Debian images as base images. Applications that require a small footprint and minimal support from userspace, the Alpine Linux base image is best. For other applications, especially those that require a more complete userspace and path to grown, the Debian base image is recommended.
I find that the Debian base image is suited for the general case where there are dependencies on other common services and libraries, and for applications that require a path for growth by accessing Debian’s existing, large and mature packaging ecosystem.
Alpine Linux is suited for more stand-alone applications where a small footprint is required, and little reliance on other userspace packages and services that are part of a traditional general-purpose Linux distribution.