After spending lots of time blogging about my Docker images, I took some hard looks at my Dockerfiles and noticed a recurring pattern.
I have a lot of Github projects for various scripts and tools, and my general strategy for making Docker images out of those tools / scripts / services is to build a Dockerfile that clones the git repo into the container on build. This is fantastic for guaranteeing I always have the most up to date scripts in my containers – but it means I have to install git in every container.
Git has a lot of dependencies and brings a lot along for the ride, including a full Perl installation. That’s a lot of extra beef to throw into a container, and it doesn’t really make any sense to do so if the primary process of the container doesn’t rely on Git.
Rather than installing Git, we can use ADD directives in the Dockerfile to add a file from a remote URL. Since these projects are hosted on Github, we can download a tarball or zipfile of the repo and install from there – thus avoiding the use of
git clone completely.
After learning about this, I went back and examined some previous Docker containers and found several candidates for optimization.
In all of the following examples, these changes have now been merged into the master branches and the Docker images have been updated.
Docker-JSSImport contains a Dockerfile that installs git to clone a Github repository.
Take a look at the Dockerfile at the time. Here’s a cut down version of the important parts:
FROM ubuntu:14.04 <snip> RUN apt-get update RUN apt-get install -y git RUN apt-get install -y python-setuptools RUN apt-get install -y python-psycopg2 RUN apt-get clean RUN rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* RUN easy_install pip RUN pip install python-jss RUN git clone https://github.com/nmcspadden/JSSImport $APP_DIR
As you can see here, we install git from apt and then use git to clone the JSSImport script into a specific location. There’s no expectation that git will ever be used in the lifetime of this container, since this container’s specific purpose is to run a python script based on JSSImport, and then stop.
So git provides only unnecessary space usage, with no added benefit.
Instead, we can convert this to use an ADD directive instead. Look at an updated Dockerfile:
FROM debian <snip> RUN apt-get update && apt-get install -y python-setuptools python-psycopg2 && apt-get clean RUN rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* ADD https://github.com/sheagcraig/python-jss/tarball/master /usr/local/python-jss/master.tar.gz RUN tar -zxvf /usr/local/python-jss/master.tar.gz --strip-components=1 -C /usr/local/python-jss && rm /usr/local/python-jss/master.tar.gz WORKDIR /usr/local/python-jss RUN python /usr/local/python-jss/setup.py install ADD https://github.com/nmcspadden/JSSImport/tarball/master $APP_DIR/master.tar.gz RUN tar -zxvf /home/jssi/master.tar.gz --strip-components=1 -C /home/jssi/ && rm /home/jssi/master.tar.gz
Rather than installing git at all, we instead use the ADD directive to directly add a tarball (a .tar.gz archive) of the entire repo located at
$APP_DIR/master.tar.gz. Then we use the
tar command to unzip and extract the contents of that source repo, and then remove the tarball.
This accomplishes the same thing as git clone, in that we always get an updated copy of the repo whenever this image is built, but we didn’t have to install git and all its dependencies.
In addition, we’ve removed
pip from the install as well, and instead we install Python-JSS by using the setuptools’
setup.py install method. Since pip brings a lot of friends along with it, this is helpful savings.
By far, the biggest and most significant change is rebasing it off of Debian instead of Ubuntu. The base Debian image is almost 100 mb smaller than the base Ubuntu image, and there’s nothing in the Ubuntu image that we need for this image that Debian doesn’t have. Just by switching to Debian, we eliminate even more space storage, with no loss in functionality.
So how does this help us out? Take a look at the Docker image sizes before and after. “macadmins/jssimport” was the before and “nmcspadden/jssimport” is the after:
root@docker docker-jssimport]# docker images REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE nmcspadden/jssimport latest 1aa70b93ee78 34 minutes ago 141.5 MB macadmins/jssimport latest 5fb1da38fa7a 4 days ago 288.4 MB
We shaved off 140 megabytes by removing git and pip, and switching to Debian.
The Munki-Puppet docker image is another example. Rather than install git, I installed
wget to download a package to install. While wget is slimmer than git in terms of space taken up, it’s still unnecessary, since wget isn’t used at any point in the container’s script execution.
Here’s a snippet of what it looked like before:
RUN apt-get update RUN apt-get install -y wget RUN apt-get install -y ca-certificates RUN wget https://apt.puppetlabs.com/puppetlabs-release-wheezy.deb RUN dpkg -i puppetlabs-release-wheezy.deb
If we remove wget completely in favor of ADD, we get the updated Dockerfile:
RUN apt-get update RUN apt-get install -y ca-certificates ADD https://apt.puppetlabs.com/puppetlabs-release-wheezy.deb /puppetlabs-release-wheezy.deb RUN dpkg -i /puppetlabs-release-wheezy.deb
What are the size savings?
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE nmcspadden/munki-puppet latest 0fd8d5026f84 6 minutes ago 186.5 MB macadmins/munki-puppet latest 270bece67420 3 days ago 191.4 MB
Only 6 MB difference this time. Not huge, but every byte counts, right?
Puppetmaster – WHDCLI
The Puppetmaster-WHDCLI project is another candidate for optimization, this one also using git in the Dockerfile.
Here’s the before:
RUN yum install -y git RUN yum install -y python-setuptools RUN yum clean all RUN git clone git://github.com/kennethreitz/requests.git /home/requests WORKDIR /home/requests RUN python /home/requests/setup.py install RUN git clone https://github.com/nmcspadden/WHD-CLI.git /home/whdcli WORKDIR /home/whdcli RUN python /home/whdcli/setup.py install
Here’s the updated Dockerfile using ADD. Note that the default centos6 Docker image doesn’t actually come with tar, so I had to install tar manually instead of git. Tar is much smaller than git, so I still gain by doing this:
RUN yum install -y tar python-setuptools && yum clean all ADD https://github.com/kennethreitz/requests/tarball/master /home/requests/master.tar.gz RUN tar -zxvf /home/requests/master.tar.gz --strip-components=1 -C /home/requests && rm -f /home/requests/master.tar.gz WORKDIR /home/requests RUN python /home/requests/setup.py install ADD https://github.com/nmcspadden/WHD-CLI/tarball/master /home/whdcli/master.tar.gz RUN tar -zxvf /home/whdcli/master.tar.gz --strip-components=1 -C /home/whdcli && rm /home/whdcli/master.tar.gz WORKDIR /home/whdcli RUN python /home/whdcli/setup.py install
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE nmcspadden/puppetmaster-whdcli latest 96eba2e80759 35 minutes ago 334.2 MB macadmins/puppetmaster-whdcli latest 4c9f1d4a3791 24 hours ago 554 MB
That’s a significant difference! We saved 220 MB from doing this.
WebHelpDesk has a bit of an unfortunate Dockerfile, because SolarWinds doesn’t offer the RHEL RPM for download uncompressed. If the RPM was on the internet available as an .rpm file, we could install it directly from the internet and save time (and space!). Since it’s only available in .rpm.gz, we do in fact have to download and decompress it before installing. Regrettable use of space in a Docker image, but unless they change that, we don’t really have any other choice.
Previously, I used curl to download the rpm.gz file and then decompressed and installed it. Here’s the before version:
RUN curl -o webhelpdesk-12.2.0-1.x86_64.rpm.gz http://downloads.solarwinds.com/solarwinds/Release/WebHelpDesk/12.2.0/webhelpdesk-12.2.0-1.x86_64.rpm.gz RUN gunzip webhelpdesk-12.2.0-1.x86_64.rpm.gz RUN yum --enablerepo=base clean metadata RUN yum install -y nano RUN yum install -y webhelpdesk-12.2.0-1.x86_64.rpm RUN rm webhelpdesk-12.2.0-1.x86_64.rpm
Here’s the updated Dockerfile that doesn’t rely on curl:
ADD http://downloads.solarwinds.com/solarwinds/Release/WebHelpDesk/12.2.0/webhelpdesk-12.2.0-1.x86_64.rpm.gz /webhelpdesk.rpm.gz RUN gunzip -dv /webhelpdesk.rpm.gz RUN yum install -y /webhelpdesk.rpm && rm /webhelpdesk.rpm && yum clean all
Using ADD is much cleaner and saves a few steps, although I did end up combining the
yum install and
rm commands onto one line – no need to have separate layers for those instructions.
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE nmcspadden/whd latest 23109b9ef528 3 minutes ago 993.2 MB macadmins/whd latest 849a80f1b702 4 days ago 1.038 GB
Not great, but 40 MB is still something.
The general theme here is to avoid adding in packages or tools in the Dockerfile just for building something, if we’re not going to use it. If we need to obtain files remotely, the ADD directive does that for us. Installing git, curl, wget, or some other remote download tool is a waste of space and time a Docker build.
Calem Hunter provided a fantastic link about optimizing Dockerfiles even further by chaining together commands as much as possible:
Another round of optimizations coming soon…