In a Bazel rule, how can I exclude certain input files from cache hit/miss detection

Question

I have a bazel genrule running a custom tool to process certain set of input files and generate an output. The problem is that the custom tool takes a long time whenever it runs, but not every change in the input files set matters to the output of the custom tool. To detect whether the changes matter, I have another script that can parse through the inputs and quickly provide information if the custom tool's output is going to be any different.

I am not able to implement the above in Bazel. The way I would like to implement is as below

INPUT_FILES --------> [RULE1] --------> OUTPUT
     |                   ^
     |                   |
     |                   |
      --------------> [RULE2]

The RULE2's output should decide whether RULE1 should run or not. But when it has to run, INPUT_FILES should be available to RULE1 . So essentially only RULE2's output should be accounted for cache hit/miss calculations while executing RULE1 and INPUT_FILES should be ignored. Is there a way to accomplish this?

EDIT: I tried some experiments and I am able to implement this if I execute RULE1 and RULE2 with sandboxing disabled. That allows RULE2 to access RULE1 's inputs without explicitly listing them. This seems hacky, but could be fine if there was a way to share a single sandbox for the rules instead of executing both without a sandbox.

Answer 1

I'm not aware of a way to do what you're describing, however there are other strategies that might work for you. (There's an additional complication, I think, which is that RULE2 wouldn't have access to the previous state of INPUT_FILES , so it wouldn't have anything to compare against to see what has changed in the inputs).

One strategy is to process the input files so that all the inconsequential parts are removed, and the long-running tool in RULE1 only ever sees the "important" stuff. This, of course, depends on exactly what your tools and rules do, but it might work.

As a simple example, you could have a tool that removes comments from code (in a way that preserves line numbers), and then the compiler action only ever sees code-only files. So, if you make a change to a comment, the input to the compiler is the same, and bazel skips the action.

This is similar to what bazel does to make building java rules more incremental. There's a tool that generates a "header jar" from java source code, which contains only the class interfaces, and upstream rules only see the header jar. That way, only changes to the interfaces of classes ever cause upstream rules to be rerun, and changes to comments or method implementations don't.

In a Bazel rule, how can I exclude certain input files from cache hit/miss detection

Question

1 answers

solution1
1 2018-11-08 20:52:27

In a Bazel rule, how can I exclude certain input files from cache hit/miss detection

Question

1 answers

solution1 1 2018-11-08 20:52:27

solution1
1 2018-11-08 20:52:27