How to: Docker reuse layers with different base images

Question

I'm doing cross-platform testing (tooling, not kernel), so I have a custom image (used for ephemeral Jenkins slaves) for each OS, based on standard base images: centos6, centos7, ubuntu14, sles11, sles12, etc.

Aside for the base being different, my images have a lot in common with each other (all of them get a copy of pre-built and frequently changing maven/gradle/npm repositories for speed).

Here is a simplified example of the way the images are created (the tarball is the same across images):

   # Dockerfile one
   FROM centos:centos6
   ADD some-files.tar.gz

   # Dockerfile two
   FROM ubuntu:14.04
   ADD some-files.tar.gz

This results in large images (multi-GB) that have to be rebuilt regularly. Some layer reuse occurs between rebuilds thanks to the docker build cache, but if I can stop having to rebuild images altogether it would be better.

How can I reliably share the common contents among my images?

The images don't change much outside of these directories. This cannot be a simple mounted volume because in use the directories in this layer are modified, so it cannot be read-only and the source must not be changed (so what I'm looking for is closer to a COW but applied to a specific subset of the image)

Answer 1

Problem with --cache-from:

The suggestion to use --cache-from will not work:

$ cat df.cache-from
FROM busybox
ARG UNIQUE_ARG=world
RUN echo Hello ${UNIQUE_ARG}
COPY . /files

$ docker build -t test-from-cache:1 -f df.cache-from --build-arg UNIQUE_ARG=docker .
Sending build context to Docker daemon   26.1MB
Step 1/4 : FROM busybox
 ---> 54511612f1c4
Step 2/4 : ARG UNIQUE_ARG=world
 ---> Running in f38f6e76bbca
Removing intermediate container f38f6e76bbca
 ---> fada1443b67b
Step 3/4 : RUN echo Hello ${UNIQUE_ARG}
 ---> Running in ee960473d88c
Hello docker
Removing intermediate container ee960473d88c
 ---> c29d98e09dd8
Step 4/4 : COPY . /files
 ---> edfa35e97e86
Successfully built edfa35e97e86
Successfully tagged test-from-cache:1

$ docker build -t test-from-cache:2 -f df.cache-from --build-arg UNIQUE_ARG=world --cache-from test-from-cache:1 .                                                                                
Sending build context to Docker daemon   26.1MB
Step 1/4 : FROM busybox
 ---> 54511612f1c4
Step 2/4 : ARG UNIQUE_ARG=world
 ---> Using cache
 ---> fada1443b67b
Step 3/4 : RUN echo Hello ${UNIQUE_ARG}
 ---> Running in 22698cd872d3
Hello world
Removing intermediate container 22698cd872d3
 ---> dc5f801fc272
Step 4/4 : COPY . /files
 ---> addabd73e43e
Successfully built addabd73e43e
Successfully tagged test-from-cache:2

$ docker inspect test-from-cache:1 -f '{{json .RootFS.Layers}}' | jq .
[
  "sha256:6a749002dd6a65988a6696ca4d0c4cbe87145df74e3bf6feae4025ab28f420f2",
  "sha256:01bf0fcfc3f73c8a3cfbe9b7efd6c2bf8c6d21b6115d4a71344fa497c3808978"
]

$ docker inspect test-from-cache:2 -f '{
{json .RootFS.Layers}}' | jq .                                                                                         
[
  "sha256:6a749002dd6a65988a6696ca4d0c4cbe87145df74e3bf6feae4025ab28f420f2",
  "sha256:c70c7fd4529ed9ee1b4a691897c2a2ae34b192963072d3f403ba632c33cba702"
]

The build shows exactly where it stops using the cache, when the command changes. And the inspect shows the change of the second layer id even though the same COPY command was run in each. And anytime the preceding layer differs, the cache cannot be used from the other image build.

The --cache-from option is there to allow you to trust the build steps from an image pulled from a registry. By default, docker only trusts layers that were locally built. But the same rules apply even when you provide this option.

Option 1:

If you want to reuse the build cache, you must have the preceding layers identical in both images. You could try using a multi-stage build if the base image for each is small enough. However, doing this would lose all of the settings outside of the filesystem (environment variables, entrypoint specification, etc), so you'd need to recreate that as well:

ARG base_image
FROM ${base_image} as base
# the above from line makes the base image available for later copying
FROM scratch
COPY large-content /content
COPY --from=base / /
# recreate any environment variables, labels, entrypoint, cmd, or other settings here

And then build that with:

docker build --build-arg base_image=base1 -t image1 .
docker build --build-arg base_image=base2 -t image2 .
docker build --build-arg base_image=base3 -t image3 .

This could also be multiple Dockerfiles if you need to change other settings. This will result in the entire contents of each base image being copied, so make sure your base image is significantly smaller to make this worth the effort.

Option 2:

Reorder your build to keep common components at the top. I understand this won't work for you, but it may help others coming across this question later. It's the preferred and simplest solution that most people use.

Option 3:

Remove the large content from your image and add it to your containers externally as a volume. You lose the immutability + copy-on-write features of layers of the docker filesystem. And you'll manually need to ship the volume content to each of your docker hosts (or use a network shared filesystem). I've seen solutions where a "sync container" is run on each of the docker hosts which performs a git pull or rsync or any other equivalent command to keep the volume updated. If you can, consider mounting the volume with :ro at the end to make it read only inside the container where you use it to give you immutability.

Answer 2

The most reliable and docker way to share the common contents between different docker images, is to refactor the commonalities between the images, into base images that the other images extend.

Example, if all the images build on top of a base image and install in it packages x, y, and z. The you refactor the installion of packages x, y and z with the base image to a newer base image, that the downstream images build on top.

Answer 3

Turns out that as of Docker 1.13, you can use the --cache-from OTHER_IMAGE flag. ( Docs )

In this situation, the solution would look like this:

docker build -t image1
docker build -t image2 --cache-from image1
docker build -t image3 --cache-from image1 --cache-from image2
... and so on

This will ensure that any layer these images have in common is reused.

UPDATE: as mentioned in other answers, this doesn't do what I expected. I admit I still don't understand what this does since it definitely changes the push behavior but the layers are not ultimately reused.

Answer 4

Given it sounds like the content of this additional 4GB of data is unrelated to the underlying container image, is there any way to mount that data outside of the container build/creation process? I know this creates an additional management step (getting the data everywhere you want the image), but assuming it can be a read-only shared mount (and then untarred by the image main process into the container filesystem as needed), this might be an easier way than building it into every image.

How to: Docker reuse layers with different base images

Question

4 answers

solution1
2 ACCPTED 2018-01-31 15:25:12

solution2
0 2018-01-25 18:21:05

solution3
0 2018-01-29 21:19:51

solution4
0 2018-01-31 10:54:56

How to: Docker reuse layers with different base images

Question

4 answers

solution1 2 ACCPTED 2018-01-31 15:25:12

solution2 0 2018-01-25 18:21:05

solution3 0 2018-01-29 21:19:51

solution4 0 2018-01-31 10:54:56

solution1
2 ACCPTED 2018-01-31 15:25:12

solution2
0 2018-01-25 18:21:05

solution3
0 2018-01-29 21:19:51

solution4
0 2018-01-31 10:54:56