如何处理浮点计算中的过度精度？

Question

In my numerical simulation I have code similar to the following snippet 在我的数值模拟中，我的代码类似于以下代码段

double x;
do {
  x = /* some computation */;
} while (x <= 0.0);
/* some algorithm that requires x to be (precisely) larger than 0 */

With certain compilers (eg gcc) on certain platforms (eg linux, x87 math) it is possible that x is computed in higher than double precision ("with excess precision"). 对于某些平台上的某些编译器（例如gcc）（例如linux，x87 math）， x的计算可能高于双精度（“精度过高”）。 ( Update : When I talk of precision here, I mean precision /and/ range.) Under these circumstances it is conceivable that the comparison ( x <= 0 ) returns false even though the next time x is rounded down to double precision it becomes 0. (And there's no guarantee that x isn't rounded down at an arbitrary point in time.) （更新：当我在这里谈到精度时，我的意思是精度/和/范围。）在这些情况下，可以想象，即使下一次x向下舍入到双倍精度，比较（ x <= 0 ）也会返回false 0.（并且无法保证x不会在任意时间点向下舍入。）

Is there any way to perform this comparison that 有没有办法进行这种比较

is portable, 便携，
works in code that gets inlined, 适用于内联的代码，
has no performance impact and 没有性能影响
doesn't exclude some arbitrary range (0, eps)? 不排除某些任意范围（0，eps）？

I tried to use ( x < std::numeric_limits<double>::denorm_min() ) but that seemed to significantly slow down the loop when working with SSE2 math. 我尝试使用（ x < std::numeric_limits<double>::denorm_min() ）但这在使用SSE2数学时似乎显着减慢了循环。 (I know that denormals can slow down a computation, but I didn't expect them to be slower to just move around and compare.) （我知道非正规可以减慢计算速度，但我没想到它们只是移动并比较慢。）

Update: An alternative is to use volatile to force x into memory before the comparison, eg by writing 更新：另一种方法是在比较之前使用volatile来强制x进入内存，例如通过写入

} while (*((volatile double*)&x) <= 0.0);

However, depending on the application and the optimizations applied by the compiler, this solution can introduce a noticeable overhead too. 但是，根据应用程序和编译器应用的优化，此解决方案也会引入明显的开销。

Update: The problem with any tolerance is that it's quite arbitrary, ie it depends on the specific application or context. 更新：任何容忍的问题在于它是非常随意的，即它取决于具体的应用程序或上下文。 I'd prefer to just do the comparison without excess precision, so that I don't have to make any additional assumptions or introduce some arbitrary epsilons into the documentation of my library functions. 我更愿意在没有过多精度的情况下进行比较，这样我就不必做任何额外的假设或在我的库函数的文档中引入一些任意的epsilons。

Answer 1

As Arkadiy stated in the comments, an explicit cast ((double)x) <= 0.0 should work - at least according to the standard. 正如Arkadiy在评论中所说，显式演员((double)x) <= 0.0 应该有效 - 至少根据标准。

C99:TC3, 5.2.4.2.2 §8: C99：TC3,5.2.4.2.2§8：

Except for assignment and cast (which remove all extra range and precision), the values of operations with floating operands and values subject to the usual arithmetic conversions and of floating constants are evaluated to a format whose range and precision may be greater than required by the type. 除了赋值和强制转换（删除所有额外的范围和精度）之外，具有浮动操作数的操作值和受常规算术转换和浮动常量限制的值将被评估为其范围和精度可能大于类型。 [...] [...]

If you're using GCC on x86, you can use the flags -mpc32 , -mpc64 and -mpc80 to set the precision of floating-point operations to single, double and extended double precision. 如果在x86上使用GCC，则可以使用标志-mpc32 ， -mpc64和-mpc80将浮点运算的精度设置为单精度，双精度和扩展双精度。

Answer 2

In your question, you stated that using volatile will work but that there'll be a huge performance hit. 在你的问题中，你说过使用volatile会起作用，但是会有很大的性能损失。 What about using the volatile variable only during the comparison, allowing x to be held in a register? 如何在比较期间使用volatile变量，允许x保存在寄存器中？

double x; /* might have excess precision */
volatile double x_dbl; /* guaranteed to be double precision */
do {
  x = /* some computation */;
  x_dbl = x;
} while (x_dbl <= 0.0);

You should also check if you can speed up the comparison with the smallest subnormal value by using long double explicitly and cache this value, ie 您还应该检查是否可以通过明确使用long double来加速与最小次正规值的比较并缓存此值，即

const long double dbl_denorm_min = static_cast<long double>(std::numeric_limits<double>::denorm_min());

and then compare 然后比较

x < dbl_denorm_min

I'd assume that a decent compiler would do this automatically, but one never knows... 我假设一个体面的编译器会自动执行此操作，但是人们永远不会知道......

Answer 3

I wonder whether you have the right stopping criterion. 我想知道你是否有正确的停止标准。 It sounds like x <= 0 is an exception condition, but not a terminating condition and that the terminating condition is easier to satisfy. 听起来x <= 0是一个异常条件，但不是终止条件，并且终止条件更容易满足。 Maybe there should be a break statement inside your while loop that stops the iteration when some tolerance is met. 也许在你的while循环中应该有一个break语句，当满足一些容差时停止迭代。 For example, a lot of algorithm terminate when two successive iterations are sufficiently close to each other. 例如，当两个连续迭代彼此足够接近时，许多算法终止。

Answer 4

Well, GCC has a flag, -fexcess-precision which causes the problem you are discussing. 好吧，GCC有一个标志，-fexcess-precision会导致你正在讨论的问题。 It also has a flag, -ffloat-store , which solves the problem you are discussing. 它还有一个标志，-ffloat-store，它解决了你正在讨论的问题。

"Do not store floating point variables in registers. This pre-vents undesirable excess precision on machines such as the 68000 where the floating registers (of the 68881) keep more precision than a double is supposed to have." “不要将浮点变量存储在寄存器中。这样可以避免在机器上出现不必要的过高精度，例如68000浮动寄存器（68881）保持精度高于双倍应该具有的精度。”

I doubt that solution has no performance impact, but the impact is probably not overly expensive. 我怀疑该解决方案没有性能影响，但影响可能不会过于昂贵。 Random googling suggests it costs about 20%. 随机谷歌搜索表明它的成本约为20％。 Actually, I don't think there is a solution which is both portable and has no performance impact, since forcing a chip to not use excess precision is often going to involve some non-free operation. 其实，我不认为这是一个解决方案，它是便携式和对性能没有影响，因为迫使芯片不使用过多的精度往往要涉及到一些非自由操作。 However, this is probably the solution you want. 但是，这可能是您想要的解决方案。

Answer 5

Be sure to make that check an absolute value. 一定要检查绝对值。 It needs to be an epsilon around zero, above and below. 它需要是一个大约零，上下的epsilon。

如何处理浮点计算中的过度精度？

问题描述

5 个解决方案

解决方案1
7 2009-02-02 15:11:05

解决方案2
2 2009-02-02 17:13:43

解决方案3
1 2009-02-02 15:00:41

解决方案4
0 2009-02-02 15:01:21

解决方案5
0 2009-02-02 15:02:54

如何处理浮点计算中的过度精度？

问题描述

5 个解决方案

解决方案1 7 2009-02-02 15:11:05

解决方案2 2 2009-02-02 17:13:43

解决方案3 1 2009-02-02 15:00:41

解决方案4 0 2009-02-02 15:01:21

解决方案5 0 2009-02-02 15:02:54

解决方案1
7 2009-02-02 15:11:05

解决方案2
2 2009-02-02 17:13:43

解决方案3
1 2009-02-02 15:00:41

解决方案4
0 2009-02-02 15:01:21

解决方案5
0 2009-02-02 15:02:54