简体繁体 English

强制浮点数在 .NET 中具有确定性？

[英]Coercing floating-point to be deterministic in .NET?

原文 2013-02-13 22:15:12 5 2 c#/ .net/ floating-point/ ieee-754/ non-deterministic

I've been reading a lot about floating-point determinism in .NET, ie ensuring that the same code with the same inputs will give the same results across different machines.我已经阅读了很多关于 .NET 中浮点确定性的内容，即确保具有相同输入的相同代码将在不同机器上给出相同结果。 Since .NET lacks options like Java's fpstrict and MSVC's fp:strict, the consensus seems to be that there is no way around this issue using pure managed code.由于 .NET 缺少像 Java 的 fpstrict 和 MSVC 的 fp:strict 这样的选项，因此共识似乎是使用纯托管代码无法解决这个问题。 The C# game AI Wars has settled on using Fixed-point math instead, but this is a cumbersome solution. C# 游戏 AI Wars 已决定改用定点数学，但这是一个麻烦的解决方案。

The main issue appears to be that the CLR allows intermediate results to live in FPU registers that have higher precision than the type's native precision, leading to impredictably higher precision results.主要问题似乎是 CLR 允许中间结果存在于精度高于类型的本机精度的 FPU 寄存器中，从而导致不可预测的精度更高的结果。 An MSDN article by CLR engineer David Notario explains the following: CLR 工程师 David Notario 的一篇 MSDN 文章解释了以下内容：

Note that with current spec, it's still a language choice to give 'predictability'.请注意，根据当前的规范，它仍然是提供“可预测性”的语言选择。 The language may insert conv.r4 or conv.r8 instructions after every FP operation to get a 'predictable' behavior.该语言可能会在每个 FP 操作之后插入 conv.r4 或 conv.r8 指令以获得“可预测”的行为。 Obviously, this is really expensive, and different languages have different compromises.显然，这真的很昂贵，不同的语言有不同的妥协。 C#, for example, does nothing, if you want narrowing, you will have to insert (float) and (double) casts by hand.例如，C# 什么都不做，如果要缩小范围，则必须手动插入 (float) 和 (double) 强制转换。

This suggests that one may achieve floating-point determinism simply by inserting explicit casts for every expression and sub-expression that evaluates to float.这表明可以通过为每个计算结果为浮点的表达式和子表达式插入显式强制转换来实现浮点确定性。 One might write a wrapper type around float to automate this task.人们可能会围绕 float 编写一个包装器类型来自动执行此任务。 This would be a simple and ideal solution!这将是一个简单而理想的解决方案！

Other comments however suggest that it isn't so simple.然而，其他评论表明它并不那么简单。 Eric Lippert recently stated (emphasis mine): Eric Lippert 最近表示（强调我的）：

in some version of the runtime, casting to float explicitly gives a different result than not doing so.在运行时的某些版本中，显式转换为 float 会给出与不这样做不同的结果。 When you explicitly cast to float, the C# compiler gives a hint to the runtime to say "take this thing out of extra high precision mode if you happen to be using this optimization".当您显式转换为浮点数时，C# 编译器会提示运行时说“如果您碰巧正在使用此优化，请将其从超高精度模式中移除”。

Just what is this "hint" to the runtime?这个运行时的“提示”是什么？ Does the C# spec stipulate that an explicit cast to float causes the insertion of a conv.r4 in the IL? C# 规范是否规定显式转换为 float 会导致在 IL 中插入 conv.r4？ Does the CLR spec stipulate that a conv.r4 instruction causes a value to be narrowed down to its native size? CLR 规范是否规定 conv.r4 指令会导致值缩小到其本机大小？ Only if both of these are true can we rely on explicit casts to provide floating point "predictability" as explained by David Notario.只有当这两个都为真时，我们才能依赖显式强制转换来提供浮点“可预测性”，正如 David Notario 所解释的那样。

Finally, even if we can indeed coerce all intermediate results to the type's native size, is this enough to guarantee reproducibility across machines, or are there other factors like FPU/SSE run-time settings?最后，即使我们确实可以将所有中间结果强制转换为类型的本机大小，这是否足以保证跨机器的可重复性，或者是否还有其他因素，如 FPU/SSE 运行时设置？

2 个解决方案

Just what is this "hint" to the runtime?这个运行时的“提示”是什么？

As you conjecture, the compiler tracks whether a conversion to double or float was actually present in the source code, and if it was, it always inserts the appropriate conv opcode.正如您推测的那样，编译器会跟踪源代码中是否实际存在向双精度或浮点数的转换，如果存在，它总是插入适当的转换操作码。

Does the C# spec stipulate that an explicit cast to float causes the insertion of a conv.r4 in the IL? C# 规范是否规定显式转换为 float 会导致在 IL 中插入 conv.r4？

No, but I assure you that there are unit tests in the compiler test cases that ensure that it does.不，但我向您保证，编译器测试用例中有单元测试可以确保它确实如此。 Though the specification does not demand it, you can rely on this behaviour.虽然规范没有要求它，但您可以依赖这种行为。

The specification's only comment is that any floating point operation may be done in a higher precision than required at the whim of the runtime, and that this can make your results unexpectedly more accurate.该规范的唯一评论是，任何浮点运算都可能以比运行时所要求的精度更高的精度完成，并且这可以使您的结果出乎意料地更加准确。 See section 4.1.6.见第 4.1.6 节。

Does the CLR spec stipulate that a conv.r4 instruction causes a value to be narrowed down to its native size? CLR 规范是否规定 conv.r4 指令会导致值缩小到其本机大小？

Yes, in Partition I, section 12.1.3, which I note you could have looked up yourself rather than asking the internet to do it for you.是的，在第 I 部分的第 12.1.3 节中，我注意到您可以自行查找，而不是要求互联网为您查找。 These specifications are free on the web.这些规范在网络上是免费的。

A question you didn't ask but probably should have:您没有问过但可能应该问的问题：

Is there any operation other than casting that truncates floats out of high precision mode?除了强制转换之外，还有什么操作可以将浮点数从高精度模式中截断吗？

Yes.是的。 Assigning to a static field, instance field or element of a double[] or float[] array truncates.分配给double[]或float[]数组的静态字段、实例字段或元素会被截断。

Is consistent truncation enough to guarantee reproducibility across machines?一致的截断是否足以保证跨机器的可重复性？

No. I encourage you to read section 12.1.3, which has much interesting to say on the subject of denormals and NaNs.不。我鼓励您阅读第 12.1.3 节，其中有很多关于非正规数和 NaN 的有趣内容。

And finally, another question you did not ask but probably should have:最后，另一个你没有问但可能应该问的问题：

How can I guarantee reproducible arithmetic?我如何保证可重复的算术？

Use integers.使用整数。

The 8087 Floating Point Unit chip design was Intel's billion dollar mistake. 8087 浮点单元芯片设计是英特尔的十亿美元错误。 The idea looks good on paper, give it an 8 register stack that stores values in extended precision, 80 bits.这个想法在纸面上看起来不错，给它一个 8 寄存器堆栈，以扩展精度，80 位存储值。 So that you can write calculations whose intermediate values are less likely to lose significant digits.这样您就可以编写中间值不太可能丢失有效数字的计算。

The beast is however impossible to optimize for.然而，野兽是不可能优化的。 Storing a value from the FPU stack back to memory is expensive.将 FPU 堆栈中的值存储回内存是很昂贵的。 So keeping them inside the FPU is a strong optimization goal.因此，将它们保留在 FPU 内是一个强大的优化目标。 Inevitable, having only 8 registers is going to require a write-back if the calculation is deep enough.不可避免的是，如果计算足够深，只有 8 个寄存器将需要回写。 It is also implemented as a stack, not freely addressable registers so that requires gymnastics as well that may produce a write-back.它也被实现为堆栈，而不是可自由寻址的寄存器，因此也需要可能产生回写的体操。 Inevitably a write back will truncate the value from 80-bits back to 64-bits, losing precision.不可避免地，写回会将值从 80 位截断回 64 位，从而失去精度。

So consequences are that non-optimized code does not produce the same result as optimized code.因此，结果是非优化代码不会产生与优化代码相同的结果。 And small changes to the calculation can have big effects on the result when an intermediate value ends up needing to be written back.当中间值最终需要写回时，对计算的微小更改可能会对结果产生很大的影响。 The /fp:strict option is a hack around that, it forces the code generator to emit a write-back to keep the values consistent, but with the inevitable and considerable loss of perf. /fp:strict 选项是一个黑客，它强制代码生成器发出回写以保持值一致，但不可避免地会损失大量性能。

This is a complete rock and a hard place.这是一块完整的岩石和坚硬的地方。 For the x86 jitter they just didn't try to address the problem.对于 x86 抖动，他们只是没有尝试解决问题。

Intel didn't make the same mistake when they designed the SSE instruction set.英特尔在设计 SSE 指令集时没有犯同样的错误。 The XMM registers are freely addressable and don't store extra bits. XMM 寄存器可自由寻址且不存储额外位。 If you want consistent results then compiling with the AnyCPU target, and a 64-bit operating system, is the quick solution.如果您想要一致的结果，那么使用 AnyCPU 目标和 64 位操作系统进行编译是快速的解决方案。 The x64 jitter uses SSE instead of FPU instructions for floating point math. x64 抖动使用 SSE 而不是 FPU 指令进行浮点数学运算。 Albeit that this added a third way that a calculation can produce a different result.尽管这增加了计算可以产生不同结果的第三种方式。 If the calculation is wrong because it loses too many significant digits then it will be consistently wrong.如果计算是错误的，因为它丢失了太多有效数字，那么它将始终是错误的。 Which is a bit of a bromide, really, but typically only as far as a programmer looks.这确实有点像溴化物，但通常仅就程序员而言。