[英]Writing genrule with randomness in Bazel
We have a code generator that takes random seed as an input.我们有一个将随机种子作为输入的代码生成器。 If no seed specified, it will randomly pick a random seed, which means the outcome is not deterministic:
如果没有指定种子,它将随机选择一个随机种子,这意味着结果是不确定的:
# generated_code1.h and generated_code2.h are almost always different
my-code-gen -o generated_code1.h
my-code-gen -o generated_code2.h
On the other hand,另一方面,
# generated_code3.h and generated_code4.h are always the same
my-code-gen --seed 1234 -o generated_code3.h
my-code-gen --seed 1234 -o generated_code4.h
Our first attempt to create a target for the generated code was:我们为生成的代码创建目标的第一次尝试是:
genrule(
name = "generated_code",
srcs = [],
outs = ["generated_code.h"],
cmd = "my-code-gen -o $@", # Notice that seed not specified
)
However, we think this breaks the hermeticity of targets depending on :generated_code
.但是,我们认为这会根据
:generated_code
打破目标的密封性。 So we ended up implementing a customized rule and use build_setting
(ie configuration ) to configure the seed for the invocation of my-code-gen
.所以我们最终实现了一个自定义规则并使用
build_setting
(即configuration )来配置种子以调用my-code-gen
。
This makes it possible to specify the seed from CLI to any targets that depends on the generated code, eg这使得可以将 CLI 中的种子指定给依赖于生成代码的任何目标,例如
bazel build :generated_code --//:code-gen-seed=1234
bazel build :binary --//:code-gen-seed=1234
My questions are:我的问题是:
genrule
definition above, it is calling my-code-gen
without --seed
which results in non-deterministic output.genrule
定义,它在没有--seed
的情况下调用my-code-gen
,这会导致不确定的 output。 Does that mean non-hermetic?--action_env
as an alternative to build_setting
, which also allow us to pass a seed value from CLI to my-code-gen
.--action_env
可以替代build_setting
,它还允许我们将种子值从 CLI 传递到my-code-gen
。 Compared to build_setting
, what is the preferred approach in our case?build_setting
相比,在我们的案例中,首选的方法是什么? Yes, it's non-hermetic.是的,它是非密封的。 To be more precise, this is non-determinism, which is a symptom of a non-hermetic build, because the PRNG isn't seeded with a statically known value to the build system.
更准确地说,这是非确定性,这是非密封构建的症状,因为 PRNG 没有以构建系统的静态已知值播种。 A common other cause of non-determinism is embedding timestamps in build outputs.
不确定性的另一个常见原因是在构建输出中嵌入时间戳。
Bazel defines hermeticity as: Bazel将密封性定义为:
When given the same input source code and product configuration, a hermetic build system always returns the same output by isolating the build from changes to the host system.
当给定相同的输入源代码和产品配置时,封闭的构建系统总是通过将构建与对主机系统的更改隔离开来返回相同的 output。
In order to isolate the build, hermetic builds are insensitive to libraries and other software installed on the local or remote host machine.
为了隔离构建,密封构建对安装在本地或远程主机上的库和其他软件不敏感。 They depend on specific versions of build tools, such as compilers, and dependencies, such as libraries.
它们依赖于特定版本的构建工具(例如编译器)和依赖项(例如库)。 This makes the build process self-contained as it doesn't rely on services external to the build environment.
这使得构建过程自包含,因为它不依赖于构建环境外部的服务。
The biggest problem is breaking cacheability of everything that depends on the genrule, because you can no longer trust/guarantee that given a cache key (ie hashes of the genrule's inputs, command, environment), the output will be identical and reproducible across build invocations.最大的问题是破坏依赖于 genrule 的所有内容的可缓存性,因为您不能再信任/保证给定缓存键(即 genrule 的输入、命令、环境的哈希),output 将在构建调用中相同且可重现.
This has costs ranging from成本范围从
The //:code-gen-seed
build setting only affects targets that depend on it, but --action_env
affects every action. //:code-gen-seed
构建设置仅影响依赖它的目标,但--action_env
影响每个操作。 Changes to the build setting would only invalidate the minimal set of targets, and causing minimal re-analysis, cache lookups, and rebuilds, and is thus preferred.对构建设置的更改只会使最小的目标集无效,并导致最少的重新分析、缓存查找和重建,因此是首选。 You can experiment with this by comparing incremental build speeds with more targets that don't depend on
//:code-gen-seed
.您可以通过将增量构建速度与不依赖于
//:code-gen-seed
的更多目标进行比较来对此进行试验。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.