简体   繁体   中英

How are Docker buildx layer cache hashes calculated?

I'm digging into the caching of Docker buildx to try to debug an issue . I'm trying to figure out how, exactly, buildx checks if a layer is available in the local cache. Although I've searched fairly extensively, I can't seem to find any documentation on this.

Looking at the local cache files themselves, I see a bunch of files with hash names. My assumption is that it works as follows (assuming use of type=local,mode=max ):

  1. For each line in the Dockerfile, it uses some combination of parameters to calculate a SHA hash.
  2. It checks in the --cache-from directory to see if a file with that hash as the name exists
  3. If it does exist, it uses that file as the layer and doesn't re-build anything (and copies that file to the --cache-to directory.
  4. If it does not exist, it builds the layer and saves it as a file, with that hash as the name, in the --cache-to directory.
  5. This results in an output cache with 1 file for each line in the Dockerfile.

So my questions are:

  1. Is my understanding of this process correct? Am I missing any key elements?
  2. For step (1) above, what are the "parameters" that it uses to calculate the hash? I would think it's the string value of the line itself, plus the value of any files that are copied by the line (eg ADD ), but does it use anything else? eg the last-modified timestamp of any files that it copies?
  1. Is my understanding of this process correct? Am I missing any key elements?

My understanding is roughly along those lines. I'd need to check the code myself to know the specifics.

  1. For step (1) above, what are the "parameters" that it uses to calculate the hash? I would think it's the string value of the line itself, plus the value of any files that are copied by the line (eg ADD), but does it use anything else? eg the last-modified timestamp of any files that it copies?

In general, caching of Dockerfile steps uses the following (this predates buildkit):

  • For an ADD/COPY step, a hash of the source files. That hash includes file ownership and permissions. A quick test indicates the modification timestamp is not included in that (the cache is still used after I touched a file being copied).
  • For a RUN step, any ENV or ARG key/value pairs are included, because they modify the environment, along with the text of the command being run. Docker has no concept of commands pulling from external resources, and it doesn't know what environment variable changes will impact any specific command.
  • For all steps, the cache requires that the previous step's result plus the current step matches (the new COPY --link may be an exception to this). So if you copy a new file into the image, all remaining steps of that build stage are no longer found in the cache, docker has no way to generically know that a specific file doesn't affect some RUN steps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM