最小化C中浮点错误的经验法则？

Question

Regarding minimising the error in floating-point operations, if I have an operation such as the following in C: 关于最小化浮点运算中的错误，如果我在C中执行如下操作：

float a = 123.456;
float b = 456.789;
float r = 0.12345;
a = a - (r * b);

Will the result of the calculation change if I split the multiplication and subtraction steps out, ie: 如果我将乘法和减法步骤分开，计算结果是否会改变，即：

float c = r * b;
a = a - c;

I am wondering whether a CPU would then treat these calculations differently and thereby the error may be smaller in one case? 我想知道CPU是否会以不同方式处理这些计算，从而在一种情况下误差可能会更小？

If not, which I presume anyway, are there any good rules-of-thumb to mitigate against floating-point error? 如果不是，我认为无论如何，是否有任何良好的经验法则来缓解浮点错误？ Can I massage data in a way that will help? 我可以按照有用的方式按摩数据吗？

Please don't just say "use higher precision" - that's not what I'm after. 请不要只说“使用更高的精度” - 这不是我所追求的。

EDIT 编辑

For information about the data, in the general sense errors seem to be worse when the operation results in a very large number like 123456789. Small numbers, such as 1.23456789, seem to yield more accurate results after operations. 有关数据的信息，在一般意义上，当操作导致非常大的数字（如123456789）时，错误似乎更糟。小数字（例如1.23456789）似乎在操作后产生更准确的结果。 Am I imagining this, or would scaling larger numbers help accuracy? 我想象这个，还是扩大数字有助于准确？

Answer 1

Note: this answer starts with a lengthy discussion of the distinction between a = a - (r * b); 注意：这个答案首先是对a = a - (r * b);之间区别的冗长讨论a = a - (r * b); and float c = r * b; a = a - c; 并float c = r * b; a = a - c; float c = r * b; a = a - c; with a c99-compliant compiler. 使用符合c99标准的编译器。 The part of the question about the goal of improving accuracy while avoiding extended precision is covered at the end. 最后讨论了关于提高准确性同时避免扩展精度的目标的部分问题。

Extended floating-point precision for intermediate results 中间结果的扩展浮点精度

If your C99 compiler defines FLT_EVAL_METHOD as 0, then the two computations can be expected to produce exactly the same result. 如果您的C99编译器将 FLT_EVAL_METHOD 定义为0，那么这两个计算可以产生完全相同的结果。 If the compiler defines FLT_EVAL_METHOD to 1 or 2, then a = a - (r * b); 如果编译器将FLT_EVAL_METHOD定义为1或2，则a = a - (r * b); will be more precise for some values of a , r and b , because all intermediate computations will be done at an extended precision ( double for the value 1 and long double for the value 2). 将成为的某些值更精确的a ， r和b ，因为所有的中间计算将在扩展精度（来完成double为值1，并且long double为值2）。

The program cannot set FLT_EVAL_METHOD , but you can use commandline options to change the way your compiler computes with floating-point, and that will make it change its definition accordingly. 程序无法设置FLT_EVAL_METHOD ，但您可以使用命令行选项来更改编译器使用浮点计算的方式，这将使其相应地更改其定义。

Contraction of some intermediate results 收缩一些中间结果

Depending whether you use #pragma fp_contract in your program and on your compiler's default value for this pragma, some compound floating-point expressions can be contracted into single instructions that behave as if the intermediate result was computed with infinite precision. 根据您是否在程序中使用#pragma fp_contract以及编译器的编译器默认值，可以将一些复合浮点表达式缩减为单个指令，其行为就像中间结果是以无限精度计算的一样。 This happens to be a possibility for your example when targeting a modern processor, as the fused-multiply-add instruction will compute a directly and as accurately as allowed by the floating-point type. 发生这种情况靶向现代处理器时，作为成为你的例子的可能性稠-乘法-加法指令将计算a直接和尽可能准确允许的浮点类型。

However, you should bear in mind that the contraction only take place at the compiler's option, without any guarantees. 但是，您应该记住，收缩只发生在编译器的选项上，没有任何保证。 The compiler uses the FMA instruction to optimize speed, not accuracy, so the transformation may not take place at lower optimization levels. 编译器使用FMA指令来优化速度，而不是精度，因此转换可能不会在较低的优化级别进行。 Sometimes several transformations are possible (eg a * b + c * d can be computed either as fmaf(c, d, a*b) or as fmaf(a, b, c*d) ) and the compiler may choose one or the other. 有时可以进行多次转换（例如a * b + c * d可以计算为fmaf(c, d, a*b)或fmaf(a, b, c*d) ），编译器可以选择一个或者其他。

In short, the contraction of floating-point computations is not intended to help you achieve accuracy. 简而言之，浮点计算的收缩并不是为了帮助您实现准确性。 You might as well make sure it is disabled if you like reproducible results. 如果您喜欢可重现的结果，也可以确保它被禁用。

However, in the particular case of the fused-multiply-add compound operation, you can use the C99 standard function fmaf() to tell the compiler to compute the multiplication and addition in a single step with a single rounding. 但是，在fmaf() -multiply-add复合操作的特定情况下，您可以使用C99标准函数fmaf()告诉编译器通过单个舍入在一个步骤中计算乘法和加法。 If you do this, then the compiler will not be allowed to produce anything else than the best result for a . 如果你这样做，那么编译器将不允许产生除了a的最佳结果之外的任何东西。

float fmaf(float x, float y, float z);

DESCRIPTION
     The fma() functions compute (x*y)+z, rounded as one ternary operation:
     they compute the value (as if) to infinite precision and round once to
     the result format, according to the current rounding mode.

Note that if the FMA instruction is not available, your compiler's implementation of the function fmaf() will at best just use higher precision , and if this happens on your compilation platform, your might just as well use the type double for the accumulator: it will be faster and more accurate than using fmaf() . 注意，如果FMA指令不可用，那么编译器的函数fmaf()的实现最多只能使用更高的精度，如果在编译平台上发生这种情况，你可能也会使用类型double来表示累加器：它比使用fmaf()更快更准确。 In the worst case, a flawed implementation of fmaf() will be provided. 在最坏的情况下，将提供fmaf()的有缺陷的实现。

Improving accuracy while only using single-precision 仅使用单精度提高精度

Use Kahan summation if your computation involves a long chain of additions. 如果您的计算涉及长链添加，请使用Kahan求和。 Some accuracy can be gained by simply summing the r*b terms computed as single-precision products, assuming there are many of them. 通过简单地将r*b项计算为单精度乘积，可以获得一些精度，假设它们中有许多。 If you wish to gain more accuracy, you might want to compute r*b itself exactly as the sum of two single-precision numbers, but if you do this you might as well switch to double-single arithmetics entirely. 如果你想获得更高的准确度，你可能想要将r*b本身精确地计算为两个单精度数的总和，但如果你这样做，你也可以完全转换为双单数算术。 Double-single arithmetics would be the same as the double-double technique succinctly described here , but with single-precision numbers instead. 双单算法将与此处简洁描述的双重双重技术相同，但使用单精度数字。

最小化C中浮点错误的经验法则？

问题描述

1 个解决方案

解决方案1
8 已采纳 2014-08-07 12:39:51

Extended floating-point precision for intermediate results 中间结果的扩展浮点精度

Contraction of some intermediate results 收缩一些中间结果

Improving accuracy while only using single-precision 仅使用单精度提高精度

最小化C中浮点错误的经验法则？

问题描述

1 个解决方案

解决方案1 8 已采纳 2014-08-07 12:39:51

Extended floating-point precision for intermediate results 中间结果的扩展浮点精度

Contraction of some intermediate results 收缩一些中间结果

Improving accuracy while only using single-precision 仅使用单精度提高精度

解决方案1
8 已采纳 2014-08-07 12:39:51