在 Bazel 中编写具有随机性的类型

Question

We have a code generator that takes random seed as an input.我们有一个将随机种子作为输入的代码生成器。 If no seed specified, it will randomly pick a random seed, which means the outcome is not deterministic:如果没有指定种子，它将随机选择一个随机种子，这意味着结果是不确定的：

# generated_code1.h and generated_code2.h are almost always different
my-code-gen -o generated_code1.h
my-code-gen -o generated_code2.h

On the other hand,另一方面，

# generated_code3.h and generated_code4.h are always the same
my-code-gen --seed 1234 -o generated_code3.h
my-code-gen --seed 1234 -o generated_code4.h

Our first attempt to create a target for the generated code was:我们为生成的代码创建目标的第一次尝试是：

genrule(
    name = "generated_code",
    srcs = [],
    outs = ["generated_code.h"],
    cmd = "my-code-gen -o $@", # Notice that seed not specified
)

However, we think this breaks the hermeticity of targets depending on :generated_code .但是，我们认为这会根据:generated_code打破目标的密封性。 So we ended up implementing a customized rule and use build_setting (ie configuration ) to configure the seed for the invocation of my-code-gen .所以我们最终实现了一个自定义规则并使用build_setting （即configuration ）来配置种子以调用my-code-gen 。

This makes it possible to specify the seed from CLI to any targets that depends on the generated code, eg这使得可以将 CLI 中的种子指定给依赖于生成代码的任何目标，例如

bazel build :generated_code --//:code-gen-seed=1234
bazel build :binary --//:code-gen-seed=1234

My questions are:我的问题是：

Consider the genrule definition above, it is calling my-code-gen without --seed which results in non-deterministic output.考虑上面的genrule定义，它在没有--seed的情况下调用my-code-gen ，这会导致不确定的 output。 Does that mean non-hermetic?这是否意味着非密封？ What is the cost of breaking hermeticity?打破密封性的成本是多少？ (eg what trouble would it cause in the future?) （例如将来会造成什么麻烦？）
I've found --action_env as an alternative to build_setting , which also allow us to pass a seed value from CLI to my-code-gen .我发现--action_env可以替代build_setting ，它还允许我们将种子值从 CLI 传递到my-code-gen 。 Compared to build_setting , what is the preferred approach in our case?与build_setting相比，在我们的案例中，首选的方法是什么？

Answer 1

Consider the genrule definition above, it is calling my-code-gen without --seed which results in non-deterministic output.考虑上面的 genrule 定义，它在没有 --seed 的情况下调用 my-code-gen，这会导致不确定的 output。 Does that mean non-hermetic?这是否意味着非密封？ What is the cost of breaking hermeticity?打破密封性的成本是多少？ (eg what trouble would it cause in the future?) （例如将来会造成什么麻烦？）

Yes, it's non-hermetic.是的，它是非密封的。 To be more precise, this is non-determinism, which is a symptom of a non-hermetic build, because the PRNG isn't seeded with a statically known value to the build system.更准确地说，这是非确定性，这是非密封构建的症状，因为 PRNG 没有以构建系统的静态已知值播种。 A common other cause of non-determinism is embedding timestamps in build outputs.不确定性的另一个常见原因是在构建输出中嵌入时间戳。

Bazel defines hermeticity as: Bazel将密封性定义为：

When given the same input source code and product configuration, a hermetic build system always returns the same output by isolating the build from changes to the host system.当给定相同的输入源代码和产品配置时，封闭的构建系统总是通过将构建与对主机系统的更改隔离开来返回相同的 output。

In order to isolate the build, hermetic builds are insensitive to libraries and other software installed on the local or remote host machine.为了隔离构建，密封构建对安装在本地或远程主机上的库和其他软件不敏感。 They depend on specific versions of build tools, such as compilers, and dependencies, such as libraries.它们依赖于特定版本的构建工具（例如编译器）和依赖项（例如库）。 This makes the build process self-contained as it doesn't rely on services external to the build environment.这使得构建过程自包含，因为它不依赖于构建环境外部的服务。

The biggest problem is breaking cacheability of everything that depends on the genrule, because you can no longer trust/guarantee that given a cache key (ie hashes of the genrule's inputs, command, environment), the output will be identical and reproducible across build invocations.最大的问题是破坏依赖于 genrule 的所有内容的可缓存性，因为您不能再信任/保证给定缓存键（即 genrule 的输入、命令、环境的哈希），output 将在构建调用中相同且可重现.

This has costs ranging from成本范围从

basic usability problems ("it works on my machine")基本的可用性问题（“它适用于我的机器”）
build speeds (re-executing commands unnecessarily and wasting compute)构建速度（不必要地重新执行命令并浪费计算）
cache poisoning (fetching an unexpected output given a cache key)缓存中毒（在给定缓存键的情况下获取意外的 output）
test flakiness (tests implicitly depending on non-deterministic state instead of a fixture)测试片状（测试隐式取决于非确定性 state 而不是夹具）
software supply chain security issues (difficulty to verify provenance and reproducibility of release artifacts).软件供应链安全问题（难以验证发布工件的出处和再现性）。

I've found --action_env as an alternative to build_setting, which also allow us to pass a seed value from CLI to my-code-gen.我发现 --action_env 可以替代 build_setting，它还允许我们将种子值从 CLI 传递到 my-code-gen。 Compared to build_setting, what is the preferred approach in our case?与 build_setting 相比，在我们的案例中首选的方法是什么？

The //:code-gen-seed build setting only affects targets that depend on it, but --action_env affects every action. //:code-gen-seed构建设置仅影响依赖它的目标，但--action_env影响每个操作。 Changes to the build setting would only invalidate the minimal set of targets, and causing minimal re-analysis, cache lookups, and rebuilds, and is thus preferred.对构建设置的更改只会使最小的目标集无效，并导致最少的重新分析、缓存查找和重建，因此是首选。 You can experiment with this by comparing incremental build speeds with more targets that don't depend on //:code-gen-seed .您可以通过将增量构建速度与不依赖于//:code-gen-seed的更多目标进行比较来对此进行试验。

在 Bazel 中编写具有随机性的类型

问题描述

1 个解决方案

解决方案1
0 2022-08-16 08:48:08

在 Bazel 中编写具有随机性的类型

问题描述

1 个解决方案

解决方案1 0 2022-08-16 08:48:08

解决方案1
0 2022-08-16 08:48:08