At SeatGeek we use Multi-stage Dockerfiles to build the container images that we deploy to production. We have found them to be a great and simple way of building projects with dependencies in different languages or tools. If you are not familiar with multi-stage Dockerfiles, we recommend you take a look at this blog post.
In our first days of using them in our build pipeline, we found a few shortcomings that were making our deploys take longer than they should have. We traced these shortcomings to a missing key feature: It is not possible to carry statically generated cache files from one build to another once certain source files in the project change.
For example when building our frontend pipeline we have to invoke yarn
first to get all the npm
packages. But this command can only be executed after
adding the yarn.lock
and package.json
files to the Docker container. Because of the nature of how Docker caching works, this meant that each time those
files are modified, the node_modules
folder cached in previous built was also trashed. As you may already know, building that folder from scratch is not
a cheap operation.
Here’s an example that illustrates the issue.
Imagine you create a generic Dockerfile for building node projects
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
We can now build and tag a Docker image with for building yarn
based projects
1
|
|
The tagged image can be used in a generic way like this:
1 2 3 4 5 6 7 8 9 10 11 |
|
So far so good, we have build a pretty lean docker image that discards the node_modules
folder and only keeps the final artifact. For example a set of js bundles from a React application.
It’s also very fast to build! This is because each individual step is cleverly cached by Docker during the build processes. That is, as long as none of the steps or files used in the step have changed.
And that’s exactly where the problem is: Whenever the package.json
or yarn.lock
files change, Docker
will trash all the files in node_modules
directory as well as the cached yarn packages and will start downloading
from scratch, linking and building every single dependency.
That’s far from ideal, as it takes significant time to rebuild all dependencies. What if we could make a change to the process so that changes to those files do not bust the yarn cache? It turns out we can!
Enter docker-build-cacher`
We have built a slim utility that helps overcome the problem by providing a way to build the Dockerfile and cache all of the intermediate stages. On subsequent builds, it will make sure that the static cache files that were generated during previous builds will also be present.
The effect it has should be obvious: your builds will be consistently fast, at the cost of a bit of extra disk space.
Building and caching is done in separate steps. The first step is a replacement for the docker build
command and
the second step is the cache persisting phase.
1 2 3 4 5 6 |
|
How It Works
The docker-build-cacher
tool works by parsing the Dockerfile and extracting COPY
or ADD
instructions nested
inside ONBUILD
for each of the stages found in the file.
It will compare the source files present in such COPY
or ADD
instructions to check for changes. If it detects changes,
it rewrites the Dockerfile on the fly, such that FROM
directives in each of the stages use the locally cached images instead
of the original base image.
The effect this FROM
swap has is that disk state for the image is preserved between builds.
docker-build-cacher
is available now on GitHub under the BSD 3-Clause License.
Make sure to grab the binary files from the releases page
If you think these kinds of things are interesting, consider working with us as a Software Engineer at SeatGeek. Or, if backend development isn’t your thing, we have other openings in engineering and beyond!