简体   繁体   English

Windows 下的确定性构建

[英]Deterministic builds under Windows

The ultimate goal is comparing 2 binaries built from exact same source in exact same environment and being able to tell that they indeed are functionally equivalent.最终目标是比较在完全相同的环境中从完全相同的源构建的 2 个二进制文件,并能够判断它们在功能上确实是等价的。

One application for this would be focusing QA time on things that were actually changed between releases, as well as change monitoring in general.一个应用程序是将 QA 时间集中在发布之间实际更改的内容上,以及一般的更改监控上。

MSVC in tandem with PE format naturally makes this very hard to do. MSVC 与 PE 格式一起自然使这很难做到。

So far I found and neutralized those things:到目前为止,我发现并消除了这些东西:

  • PE timestamp and checksum PE时间戳和校验和
  • Digital signature directory entry数字签名目录条目
  • Debugger section timestamp调试器部分时间戳
  • PDB signature, age and file path PDB 签名、年龄和文件路径
  • Resources timestamp资源时间戳
  • All file/product versions in VS_VERSION_INFO resource VS_VERSION_INFO 资源中的所有文件/产品版本
  • Digital signature section数字签名部分

I parse PE, find offsets and sizes for all those things and ignore byte ranges when comparing binaries.我解析 PE,查找所有这些东西的偏移量和大小,并在比较二进制文件时忽略字节范围。 Works like charm (well, for the few tests I've run it).像魅力一样工作(好吧,对于我运行的少数测试)。 I can tell that signed executable with version 1.0.2.0 built on Win Server 2008 is equal to unsigned one, of version 10.6.6.6, build on my Win XP dev box, as long as compiler version and all sources and headers are the same.我可以看出,在 Win Server 2008 上构建的 1.0.2.0 版签名可执行文件等于在我的 Win XP 开发箱上构建的 10.6.6.6 版未签名可执行文件,只要编译器版本以及所有源和标头都相同。 This seems to work for VC 7.1 -- 9.0.这似乎适用于 VC 7.1 -- 9.0。 (For release builds) (对于发布版本)

With one caveat.有一个警告。

Absolute paths for both builds must be the same must have the same length.两个构建的绝对路径 必须相同, 必须具有相同的长度。

cl.exe converts relative paths to absolute ones, and puts them right into objects along with compiler flags and so on. cl.exe 将相对路径转换为绝对路径,并将它们与编译器标志等一起放入对象中。 This has unproportional effects on whole binary.这对整个二进制文件有不成比例的影响。 One character change in path will result in one byte changed here and there several times over whole.text section (however many objects were linked I suspect).路径中的一个字符更改将导致一个字节在整个文本部分在这里和那里发生多次更改(但我怀疑链接了很多对象)。 Changing length of the path results in significantly more differences.改变路径的长度会导致明显更多的差异。 Both in obj files and in linked binary.在 obj 文件和链接二进制文件中。

Feels like file path with compile flags is used as some kind of hash, which makes it into linked binary or even affects placement order of unrelated pieces of compiled code.感觉就像带有编译标志的文件路径被用作某种哈希,这使得它成为链接二进制文件甚至影响不相关的编译代码片段的放置顺序。

So here is the 3-part question (summarized as "what now?"):所以这是由 3 部分组成的问题(总结为“现在怎么办?”):

  • Should I abandon the whole project and go home because what I am trying to do breaks laws of physics and corporate policy of MS?我是否应该放弃整个项目并回家,因为我正在尝试做的事情违反了 MS 的物理定律和公司政策?

  • Assuming I handle absolute path issue (on policy level or by finding a magical compiler flag), are there any other things I should look out for?假设我处理绝对路径问题(在策略级别或通过找到神奇的编译器标志),还有其他我应该注意的事情吗? (things like __TIME__ do mean changed code, so I don't mind those not being ignored) (像 __TIME__ 这样的东西确实意味着改变了代码,所以我不介意那些没有被忽略的东西)

  • Is there a way to either force compiler to use relative paths, or to fool it into thinking the path is not what it is?有没有办法强制编译器使用相对路径,或者让它认为路径不是它本来的样子?

Reason for the last one is beautifully annoying Windows file system.最后一个原因是令人讨厌的 Windows 文件系统。 You just never know when deleting several gigs worth of sources and objects and svn metadata will fail because of a rogue file lock.你永远不知道什么时候删除几千兆的源和对象以及 svn 元数据会因为流氓文件锁而失败。 At least creating new root always succeeds while there is space left.至少在有剩余空间的情况下,创建新根总是成功的。 Running multiple builds at once is an issue too.一次运行多个构建也是一个问题。 Running bunch of VMs, while a solution, is a rather heavy one.运行一堆虚拟机虽然是一种解决方案,但相当繁重。

I wonder if there is a way to setup a virtual file system for a process and its children so that several process trees will see different "C:\build" dirs, private to them only, all at the same time... A light-weight virtualization of sorts...我想知道是否有一种方法可以为一个进程及其子进程设置一个虚拟文件系统,以便多个进程树将同时看到不同的“C:\build”目录,仅对它们私有...一盏灯- 各种重量虚拟化......

UPDATE: we recently opensourced the tool on GitHub .更新:我们最近在GitHub 上开源了该工具。 See Compare section in documentation.请参阅文档中的比较部分。

I solved this to an extent.我在一定程度上解决了这个问题。

Currently we have build system that makes sure all new builds are on the path of constant length (builds/001, builds/002, etc), thus avoiding shifts in the PE layout.目前我们的构建系统确保所有新构建都在恒定长度的路径上(builds/001、builds/002 等),从而避免 PE 布局发生变化。 After build a tool compares old and new binaries ignoring relevant PE fields and other locations with known superficial changes.构建工具后,将忽略相关 PE 字段和其他具有已知表面变化的位置来比较新旧二进制文件。 It also runs some simple heuristics to detect dynamic ignorable changes.它还运行一些简单的启发式方法来检测动态可忽略的变化。 Here is full list of things to ignore:以下是要忽略的事项的完整列表:

  • PE timestamp and checksum PE时间戳和校验和
  • Digital signature directory entry数字签名目录条目
  • Export table timestamp导出表时间戳
  • Debugger section timestamp调试器部分时间戳
  • PDB signature, age and file path PDB 签名、年龄和文件路径
  • Resources timestamp资源时间戳
  • All file/product versions in VS_VERSION_INFO resource VS_VERSION_INFO 资源中的所有文件/产品版本
  • Digital signature section数字签名部分
  • MIDL vanity stub for embedded type libraries (contains timestamp string)嵌入式类型库的 MIDL 虚荣存根(包含时间戳字符串)
  • __FILE__, __DATE__ and __TIME__ macros when they are used as literal strings (can be wide or narrow char) __FILE__、__DATE__ 和 __TIME__ 宏用作文字字符串时(可以是宽字符或窄字符)

Once in a while linker would make some PE sections bigger without throwing anything else out of alignment.有时,链接器会使某些 PE 部分变大,而不会导致其他任何内容不对齐。 Looks like it moves section boundary inside the padding -- it is zeros all around anyway, but because of it I'll get binaries with 1 byte difference.看起来它在填充内移动了节边界——无论如何它都是零,但正因为如此,我将得到具有 1 个字节差异的二进制文件。

UPDATE: we recently opensourced the tool on GitHub .更新:我们最近在GitHub 上开源了该工具。 See Compare section in documentation.请参阅文档中的比较部分。

Standardise Build Paths标准化构建路径

A simple solution would be to standardise on your build paths, so they are always of the form, for example:一个简单的解决方案是对您的构建路径进行标准化,因此它们始终采用以下形式,例如:

c:\buildXXXX

Then, when you compare, say, build0434 to build0398 , just preprocess the binary to change all occurrences of build0434 to build0398 .然后,当您将build0434build0398进行比较时,只需预处理二进制文件以将所有出现的build0434更改为build0398 Choose a pattern you know is unlikely to show up in your actual source/data, except in those strings the compiler/linker embed into the PE.选择一个你知道不太可能出现在你的实际源/数据中的模式,除了那些编译器/链接器嵌入到 PE 中的字符串。

Then you can just do your normal difference analysis.然后你就可以做你正常的差异分析了。 By using the same length pathnames, you won't shift any data around and cause false positives.通过使用相同长度的路径名,您不会移动任何数据并导致误报。

Dumpbin utility转储实用程序

Another tip is to use dumpbin.exe (ships with MSVC).另一个技巧是使用dumpbin.exe (MSVC 附带)。 Use dumpbin /all to dump all details of a binary to a text/hex dump.使用dumpbin /all将二进制文件的所有详细信息转储到文本/十六进制转储。 This can make it more obvious to see what/where is changing.这可以更明显地看到发生了什么/哪里发生了变化。

For example:例如:

dumpbin /all program1.exe > program1.txt
dumpbin /all program2.exe > program2.txt
windiff program1.txt program2.txt

Or use your favourite text diffing tool, instead of Windiff.或者使用您最喜欢的文本差异工具,而不是 Windiff。

Bindiff utility Bindiff 实用程序

You may find Microsoft's bindiff.exe tool useful, which can be obtained here:你可能会发现微软的bindiff.exe工具很有用,可以在这里获得:

Windows XP Service Pack 2 Support Tools Windows XP Service Pack 2 支持工具

It has a /v option, to instruct it to ignore certain binary fields, such as timestamps, checksums, etc.:它有一个 /v 选项,指示它忽略某些二进制字段,例如时间戳、校验和等:

"BinDiff uses a special compare routine for Win32 executable files that masks out various build time stamp fields in both files when performing the compare. This allows two executable files to be marked as "Near Identical" when the files are truely identical, except for the time they were built." “BinDiff 对 Win32 可执行文件使用特殊的比较例程,在执行比较时屏蔽两个文件中的各种构建时间戳字段。这允许两个可执行文件在文件真正相同时被标记为“几乎相同”,除了他们建造的时间。”

However, it sounds like you may be already doing a superset of what bindiff.exe does.但是,听起来您可能已经在做bindiff.exe 所做的超集。

Have you tried disassembling the executable and comparing the disassembly?您是否尝试反汇编可执行文件并比较反汇编? That should remove a lot of the distracting details you mention, and make removing others a lot easier.这应该会删除您提到的许多分散注意力的细节,并使删除其他细节变得容易得多。

Is there a way to either force compiler to use relative paths, or to fool it into thinking the path is not what it is?有没有办法强制编译器使用相对路径,或者让它认为路径不是它本来的样子?

You have two ways to do this:您有两种方法可以做到这一点:

  1. Use the subst.exe command and map a drive letter to the build folder (this may not be reliable).使用 subst.exe 命令并将驱动器号映射到构建文件夹(这可能不可靠)。
  2. If subst.exe doesn't work, then create shares for each of your build folders and use the "net use" command.如果 subst.exe 不起作用,则为每个构建文件夹创建共享并使用“net use”命令。 This one almost certainly should work.这几乎肯定应该有效。

In either case, you're going to map and reuse the same drive letter for a folder before you start a particular build, so that the path appears identical to the compiler.在任何一种情况下,您都将在开始特定构建之前为文件夹映射和重复使用相同的驱动器盘符,以便路径看起来与编译器相同。

I came across an additional tool to help solve this problem: Ducible on GitHub我遇到了一个额外的工具来帮助解决这个问题: GitHub 上的 Ducible

"This is a tool to make builds of Portable Executables (PEs) and PDBs reproducible." “这是一种使可移植可执行文件 (PE) 和 PDB 的构建可重现的工具。”

It modifies the provided *.exe, *.dll and *.pdb files, in place, replacing non-deterministic data with deterministic data.它修改提供的 *.exe、*.dll 和 *.pdb 文件,用确定性数据替换非确定性数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM