At SeatGeek we use Multi-stage Dockerfiles to build the container images that we deploy to production. We have found them to be a great and simple way of building projects with dependencies in different languages or tools. If you are not familiar with multi-stage Dockerfiles, we recommend you take a look at this blog post.
In our first days of using them in our build pipeline, we found a few shortcomings that were making our deploys take longer than they should have. We traced these shortcomings to a missing key feature: It is not possible to carry statically generated cache files from one build to another once certain source files in the project change.
For example when building our frontend pipeline we have to invoke
yarn first to get all the
npm packages. But this command can only be executed after
package.json files to the Docker container. Because of the nature of how Docker caching works, this meant that each time those
files are modified, the
node_modules folder cached in previous built was also trashed. As you may already know, building that folder from scratch is not
a cheap operation.
Here’s an example that illustrates the issue.
Imagine you create a generic Dockerfile for building node projects
1 2 3 4 5 6 7 8 9 10 11 12 13 14
We can now build and tag a Docker image with for building
yarn based projects
The tagged image can be used in a generic way like this:
1 2 3 4 5 6 7 8 9 10 11
So far so good, we have build a pretty lean docker image that discards the
folder and only keeps the final artifact. For example a set of js bundles from a React application.
It’s also very fast to build! This is because each individual step is cleverly cached by Docker during the build processes. That is, as long as none of the steps or files used in the step have changed.
And that’s exactly where the problem is: Whenever the
yarn.lock files change, Docker
will trash all the files in
node_modules directory as well as the cached yarn packages and will start downloading
from scratch, linking and building every single dependency.
That’s far from ideal, as it takes significant time to rebuild all dependencies. What if we could make a change to the process so that changes to those files do not bust the yarn cache? It turns out we can!
We have built a slim utility that helps overcome the problem by providing a way to build the Dockerfile and cache all of the intermediate stages. On subsequent builds, it will make sure that the static cache files that were generated during previous builds will also be present.
The effect it has should be obvious: your builds will be consistently fast, at the cost of a bit of extra disk space.
Building and caching is done in separate steps. The first step is a replacement for the
docker build command and
the second step is the cache persisting phase.
1 2 3 4 5 6
How It Works
docker-build-cacher tool works by parsing the Dockerfile and extracting
ADD instructions nested
ONBUILD for each of the stages found in the file.
It will compare the source files present in such
ADD instructions to check for changes. If it detects changes,
it rewrites the Dockerfile on the fly, such that
FROM directives in each of the stages use the locally cached images instead
of the original base image.
The effect this
FROM swap has is that disk state for the image is preserved between builds.
docker-build-cacher is available now on GitHub under the BSD 3-Clause License.
Make sure to grab the binary files from the releases page
If you think these kinds of things are interesting, consider working with us as a Software Engineer at SeatGeek. Or, if backend development isn’t your thing, we have other openings in engineering and beyond!