Optimizing Dockerfiles for Smaller Sizes

After spending lots of time blogging about my Docker images, I took some hard looks at my Dockerfiles and noticed a recurring pattern.

I have a lot of Github projects for various scripts and tools, and my general strategy for making Docker images out of those tools / scripts / services is to build a Dockerfile that clones the git repo into the container on build. This is fantastic for guaranteeing I always have the most up to date scripts in my containers – but it means I have to install git in every container.

Git has a lot of dependencies and brings a lot along for the ride, including a full Perl installation. That’s a lot of extra beef to throw into a container, and it doesn’t really make any sense to do so if the primary process of the container doesn’t rely on Git.

Rather than installing Git, we can use ADD directives in the Dockerfile to add a file from a remote URL. Since these projects are hosted on Github, we can download a tarball or zipfile of the repo and install from there – thus avoiding the use of git clone completely.

After learning about this, I went back and examined some previous Docker containers and found several candidates for optimization.

In all of the following examples, these changes have now been merged into the master branches and the Docker images have been updated.

JSSImport

Docker-JSSImport contains a Dockerfile that installs git to clone a Github repository.

Take a look at the Dockerfile at the time. Here’s a cut down version of the important parts:

FROM ubuntu:14.04
<snip>
RUN apt-get update
RUN apt-get install -y git
RUN apt-get install -y python-setuptools
RUN apt-get install -y python-psycopg2
RUN apt-get clean
RUN rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

RUN easy_install pip

RUN pip install python-jss

RUN git clone https://github.com/nmcspadden/JSSImport $APP_DIR

As you can see here, we install git from apt and then use git to clone the JSSImport script into a specific location. There’s no expectation that git will ever be used in the lifetime of this container, since this container’s specific purpose is to run a python script based on JSSImport, and then stop.

So git provides only unnecessary space usage, with no added benefit.

Instead, we can convert this to use an ADD directive instead. Look at an updated Dockerfile:

FROM debian
<snip>
RUN apt-get update && apt-get install -y python-setuptools python-psycopg2 && apt-get clean
RUN rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

ADD https://github.com/sheagcraig/python-jss/tarball/master /usr/local/python-jss/master.tar.gz
RUN tar -zxvf /usr/local/python-jss/master.tar.gz --strip-components=1 -C /usr/local/python-jss && rm /usr/local/python-jss/master.tar.gz
WORKDIR /usr/local/python-jss
RUN python /usr/local/python-jss/setup.py install

ADD https://github.com/nmcspadden/JSSImport/tarball/master $APP_DIR/master.tar.gz
RUN tar -zxvf /home/jssi/master.tar.gz --strip-components=1 -C /home/jssi/ && rm /home/jssi/master.tar.gz

Rather than installing git at all, we instead use the ADD directive to directly add a tarball (a .tar.gz archive) of the entire repo located at $APP_DIR/master.tar.gz. Then we use the tar command to unzip and extract the contents of that source repo, and then remove the tarball.

This accomplishes the same thing as git clone, in that we always get an updated copy of the repo whenever this image is built, but we didn’t have to install git and all its dependencies.

In addition, we’ve removed pip from the install as well, and instead we install Python-JSS by using the setuptools’ setup.py install method. Since pip brings a lot of friends along with it, this is helpful savings.

By far, the biggest and most significant change is rebasing it off of Debian instead of Ubuntu. The base Debian image is almost 100 mb smaller than the base Ubuntu image, and there’s nothing in the Ubuntu image that we need for this image that Debian doesn’t have. Just by switching to Debian, we eliminate even more space storage, with no loss in functionality.

So how does this help us out? Take a look at the Docker image sizes before and after. “macadmins/jssimport” was the before and “nmcspadden/jssimport” is the after:

root@docker docker-jssimport]# docker images
REPOSITORY               TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
nmcspadden/jssimport     latest              1aa70b93ee78        34 minutes ago      141.5 MB
macadmins/jssimport      latest              5fb1da38fa7a        4 days ago          288.4 MB

We shaved off 140 megabytes by removing git and pip, and switching to Debian.

Munki-Puppet

The Munki-Puppet docker image is another example. Rather than install git, I installed wget to download a package to install. While wget is slimmer than git in terms of space taken up, it’s still unnecessary, since wget isn’t used at any point in the container’s script execution.

Here’s a snippet of what it looked like before:

RUN apt-get update
RUN apt-get install -y wget
RUN apt-get install -y ca-certificates
RUN wget https://apt.puppetlabs.com/puppetlabs-release-wheezy.deb
RUN dpkg -i puppetlabs-release-wheezy.deb

If we remove wget completely in favor of ADD, we get the updated Dockerfile:

RUN apt-get update
RUN apt-get install -y ca-certificates
ADD https://apt.puppetlabs.com/puppetlabs-release-wheezy.deb /puppetlabs-release-wheezy.deb
RUN dpkg -i /puppetlabs-release-wheezy.deb

What are the size savings?

REPOSITORY                       TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
nmcspadden/munki-puppet   latest              0fd8d5026f84        6 minutes ago       186.5 MB
macadmins/munki-puppet    latest              270bece67420        3 days ago          191.4 MB

Only 6 MB difference this time. Not huge, but every byte counts, right?

Puppetmaster – WHDCLI

The Puppetmaster-WHDCLI project is another candidate for optimization, this one also using git in the Dockerfile.

Here’s the before:

RUN yum install -y git
RUN yum install -y python-setuptools
RUN yum clean all
RUN git clone git://github.com/kennethreitz/requests.git /home/requests
WORKDIR /home/requests
RUN python /home/requests/setup.py install
RUN git clone https://github.com/nmcspadden/WHD-CLI.git /home/whdcli
WORKDIR /home/whdcli
RUN python /home/whdcli/setup.py install

Here’s the updated Dockerfile using ADD. Note that the default centos6 Docker image doesn’t actually come with tar, so I had to install tar manually instead of git. Tar is much smaller than git, so I still gain by doing this:

RUN yum install -y tar python-setuptools && yum clean all
ADD https://github.com/kennethreitz/requests/tarball/master /home/requests/master.tar.gz
RUN tar -zxvf /home/requests/master.tar.gz --strip-components=1 -C /home/requests && rm -f /home/requests/master.tar.gz
WORKDIR /home/requests
RUN python /home/requests/setup.py install
ADD https://github.com/nmcspadden/WHD-CLI/tarball/master /home/whdcli/master.tar.gz
RUN tar -zxvf /home/whdcli/master.tar.gz --strip-components=1 -C /home/whdcli && rm /home/whdcli/master.tar.gz
WORKDIR /home/whdcli
RUN python /home/whdcli/setup.py install

Size savings:

REPOSITORY                       TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
nmcspadden/puppetmaster-whdcli   latest              96eba2e80759        35 minutes ago      334.2 MB
macadmins/puppetmaster-whdcli    latest              4c9f1d4a3791        24 hours ago        554 MB

That’s a significant difference! We saved 220 MB from doing this.

WebHelpDesk

WebHelpDesk has a bit of an unfortunate Dockerfile, because SolarWinds doesn’t offer the RHEL RPM for download uncompressed. If the RPM was on the internet available as an .rpm file, we could install it directly from the internet and save time (and space!). Since it’s only available in .rpm.gz, we do in fact have to download and decompress it before installing. Regrettable use of space in a Docker image, but unless they change that, we don’t really have any other choice.

Previously, I used curl to download the rpm.gz file and then decompressed and installed it. Here’s the before version:

RUN curl -o webhelpdesk-12.2.0-1.x86_64.rpm.gz http://downloads.solarwinds.com/solarwinds/Release/WebHelpDesk/12.2.0/webhelpdesk-12.2.0-1.x86_64.rpm.gz
RUN gunzip webhelpdesk-12.2.0-1.x86_64.rpm.gz
RUN yum --enablerepo=base clean metadata
RUN yum install -y nano
RUN yum install -y webhelpdesk-12.2.0-1.x86_64.rpm
RUN rm webhelpdesk-12.2.0-1.x86_64.rpm

Here’s the updated Dockerfile that doesn’t rely on curl:

ADD http://downloads.solarwinds.com/solarwinds/Release/WebHelpDesk/12.2.0/webhelpdesk-12.2.0-1.x86_64.rpm.gz /webhelpdesk.rpm.gz 
RUN gunzip -dv /webhelpdesk.rpm.gz
RUN yum install -y /webhelpdesk.rpm && rm /webhelpdesk.rpm && yum clean all

Using ADD is much cleaner and saves a few steps, although I did end up combining the yum install and rm commands onto one line – no need to have separate layers for those instructions.

Size savings:

REPOSITORY                      TAG                 IMAGE ID            CREATED              VIRTUAL SIZE
nmcspadden/whd                  latest              23109b9ef528        3 minutes ago       993.2 MB
macadmins/whd                   latest              849a80f1b702        4 days ago           1.038 GB

Not great, but 40 MB is still something.

Conclusions

The general theme here is to avoid adding in packages or tools in the Dockerfile just for building something, if we’re not going to use it. If we need to obtain files remotely, the ADD directive does that for us. Installing git, curl, wget, or some other remote download tool is a waste of space and time a Docker build.

Update:
Calem Hunter provided a fantastic link about optimizing Dockerfiles even further by chaining together commands as much as possible:
http://www.centurylinklabs.com/optimizing-docker-images/

Another round of optimizations coming soon…

3 thoughts on “Optimizing Dockerfiles for Smaller Sizes

  1. I’m curious as to why you didn’t use wget in munki-puppet and then install the package and remove the download all on the same RUN command? I’m not sure how big that package is, but it could very easily be larger than 6 MB. You could also install wget, grab the package, install the package, remove the download, and uninstall wget if you really wanted to be aggressive. You may want to try those options to see what your savings are.

    Like

    • I tried it out, just to test this hypothesis.
      New dockerfile for munki-puppet:

      RUN apt-get update && apt-get install -y wget ca-certificates && wget https://apt.puppetlabs.com/puppetlabs-release-wheezy.deb && dpkg -i /puppetlabs-release-wheezy.deb && apt-get remove -y wget && apt-get update
      RUN apt-get install -y puppet=$PUPPET_VERSION-1puppetlabs1

      After building that, the docker image size is 188.9 MB, instead of the current 186.5 MB. No savings, and it took longer to build.

      Since wget isn’t used at any point in the container’s lifetime outside of a build, there isn’t really any advantage to installing it when ADD will accomplish the same task (download file from internet).

      Like

Leave a comment