执行浮点数的 n 次加法还是执行一次整数乘法更好？

Question

Consider the two cases below:考虑以下两种情况：

// Case 1
double val { initial_value };
for (int i { 0 }; i < n; ++i) {
    val += step;
    foo(val);
}

// Case 2
for (int i { 0 }; i < n; ++i) {
    double val = initial_value + i * step;
    foo(val);
}

where n is the number of steps, initial_value is some given value of type double , step is some predetermined value of type double and val is a variable used in subsequent call for the function foo .其中n是步数， initial_value是类型的一些给定值double ， step是类型的某些预定值double和val在后续呼叫的功能中使用的变量foo 。 Which of the cases produces less of floating-point error?哪种情况会产生较少的浮点错误？ My guess would be the second one, as there are only one addition and multiplication, while the first case incurs the floating-point representation error from all of the n additions.我的猜测是第二个，因为只有一个加法和乘法，而第一种情况会导致所有n加法的浮点表示错误。 I am asking this question because I didn't know what to search for.我问这个问题是因为我不知道要搜索什么。 Does there exists some good reference for cases like these?对于此类案例，是否存在一些好的参考资料？

In practice the variable val is to be used in the loop of the both cases.实际上，变量val将用于这两种情况的循环中。 I didn't include any example for this as I'm only interested in the floating-point error.我没有包含任何示例，因为我只对浮点错误感兴趣。

Answer 1

Option 2 has significantly lower error.选项 2 的误差明显较低。

How much?多少？ Well, let's assume an initial_value of 0 for simplicity sake at first.好吧，为了简单起见，让我们首先假设initial_value为0 。 You have 53 significant bits, and how quickly you will see rounding errors depends how quickly we can manage to shift these off the far end during addition.您有 53 个有效位，您看到舍入错误的速度取决于我们在加法过程中设法将它们移出远端的速度。

So let's pick step such that the significant bits are ideally all 1s: 0.999999999999999999999999 .因此，让我们选择step ，使有效位理想地全为 1： 0.999999999999999999999999 。

Now the rounding error is log2(val/step) bits from the far end of step during each single addition.现在舍入误差是每次加法过程中距离step远端的log2(val/step)位。 Not much during the first iteration, but the error becomes noticable rather quickly.在第一次迭代期间并不多，但错误很快就会变得明显。

Picking a huge initial_value and the error can become quite extreme.选择一个巨大的initial_value并且错误会变得非常极端。 For initial_value >= pow(2, 53) * step , your first loop even fails to change val at all in between iterations.对于initial_value >= pow(2, 53) * step ，您的第一个循环甚至在迭代之间根本无法更改val 。

Your second loop still handles that correctly.您的第二个循环仍然可以正确处理。

Answer 2

Considering the comment by supercat (emphasis mine):考虑到supercat的评论（重点是我的）：

The point is that in many scenarios one might want a sequence of values that are uniformly spaced between specified start and end points.关键是在许多情况下，人们可能需要在指定的起点和终点之间均匀间隔的一系列值。 Using the second approach would yield values that are as uniformly spaced as possible between the start point and an end value that's near a desired one, but may not quite match.使用第二种方法将产生的值在起点和接近所需值的结束值之间尽可能均匀地间隔，但可能不完全匹配。

And the one by Bathsheba :还有一个由Bathsheba 写的：

Both are flawed.两者都有缺陷。 You should compute the start and end, then compute each value as a function of those.您应该计算开始和结束，然后将每个值计算为这些值的函数。 The problem with the second way is you multiply the error in step.第二种方法的问题是你一步一步地乘以错误。 The former accumulates errors.前者累积错误。

I'd suggest a couple of alternatives.我建议几个选择。

Since C++20, the Standard Library provides std::lerp where std::lerp(a, b, t) returns "the linear interpolation between a and b for the parameter t (or extrapolation, when t is outside the range [0,1])".从 C++20 开始，标准库提供std::lerp ，其中std::lerp(a, b, t)返回“参数 t 的 a 和 b 之间的线性插值（或外推，当 t 超出范围时 [ 0,1])”。
A formula like value = (a * (n - i) + b * i) / n;像value = (a * (n - i) + b * i) / n;这样的公式value = (a * (n - i) + b * i) / n; may result in a more uniform ¹ distribution of the intermediate values.可能会导致更均匀的中间值¹分布。

(1) Here I tried to test all those approaches for different extremes and number of sample points. (1)在这里，我尝试针对不同的极端情况和样本点数量测试所有这些方法。 The program compares the values generated by each algorithm when applied in the opposite directions (first from left to right, then from right to left).该程序比较每个算法在以相反方向（首先从左到右，然后从右到左）应用时生成的值。 It shows the average and variance of the sum of the absolute difference between the values of the intermediate points.它显示了中间点值之间绝对差之和的平均值和方差。

Other metrics may yield different results.其他指标可能会产生不同的结果。

Answer 3

Consider an extreme case.考虑一个极端情况。 Suppose that initial_value is much larger than step .假设initial_value远大于step 。 Much, much larger.很多很多。 So large that initial_value + step == initial_value due to the limits of floating point representation.由于浮点表示的限制， initial_value + step == initial_value如此之大。 However, we do not want this "extreme" case to get too extreme.但是，我们不希望这种“极端”情况变得过于极端。 Put a cap on initial_value , say keep it small enough to have initial_value + (2*step) != initial_value .给initial_value一个上限，比如说让它足够小，以便有initial_value + (2*step) != initial_value 。 (Some people might call this putting step between a certain epsilon and half that epsilon, but I would get the terminology mixed up.) Now run through your code. （有些人可能将这个放置step称为介于某个 epsilon 和该 epsilon 的一半之间，但我会将术语混淆。）现在运行您的代码。

In the first loop, val will equal initial_value every iteration as no operation is performed that will change its value.在第一个循环中， val每次迭代都将等于initial_value因为没有执行会更改其值的操作。 In contrast, the second loop will eventually have a different value for val , if there are enough iterations.相反，如果有足够的迭代，第二个循环最终将具有不同的val值。 Therefore, the second option, the one that calculates initial_value + i * step is more accurate in this extreme case.因此，在这种极端情况下，第二个选项，即计算initial_value + i * step的选项更准确。

We should also look at the opposite extremity.我们还应该看看相反的极端。 Suppose that initial_value is so small relative to step that initial_value + step == step .假设initial_value相对于step小到initial_value + step == step 。 In this case, initial_value might as well be zero, and the question simplifies to asking if there is a more accurate way to calculate i*step than by multiplying i and step .在这种情况下， initial_value也可能为零，问题简化为询问是否有比将i和step相乘更准确的方法来计算i*step step 。 (If there is, I might want a new compiler.) Therefore, the second option is not worse than the first in this extreme case. （如果有，我可能想要一个新的编译器。）因此，在这种极端情况下，第二个选项并不比第一个更差。

Extreme case analysis is not conclusive, but it often reveals trends.极端案例分析不是结论性的，但它往往能揭示趋势。 I pushed the calculation to opposite extremes, and the second option varied from definitely better to definitely not worse.我把计算推到了相反的极端，第二个选项从绝对更好到绝对不差。 I'd be willing to conclude that the second option produces less error.我愿意得出结论，第二个选项产生的错误更少。

Caveats: It might be that the size of the error is negligible and not worth coding around.警告：可能是错误的大小可以忽略不计，不值得编码。 Also, the question has limited scope, ignoring other considerations (such as from where step came; if it is the result of dividing by n , there might be even better alternatives).此外，该问题的范围有限，忽略了其他考虑因素（例如step从何而来；如果是除以n的结果，可能还有更好的选择）。 Still, in the narrow scenario presented by the question, calculating initial_value + i*step each iteration looks like the way to get minimal numerical error.尽管如此，在问题提出的狭窄场景中，每次迭代计算initial_value + i*step看起来像是获得最小数值误差的方法。

Answer 4

Including <cmath> and using std::fma(i, step, initial_value) will always produce the best result, presuming i is not so large that converting it to the floating-point type has a rounding error.包括<cmath>并使用std::fma(i, step, initial_value)将始终产生最佳结果，假设i不是太大以至于将其转换为浮点类型会出现舍入错误。 This is because fma is specified to produce a result equivalent to computing the real-arithmetic value of i • step + initial_value and then rounding it to the nearest representable value.这是因为fma被指定为产生一个结果，相当于计算i • step + initial_value的实数，然后将其四舍五入到最接近的可表示值。 It does not have an internal rounding after the multiplication and before the addition, so it produces the best result that is representable in the floating-point type.它在乘法之后和加法之前没有内部舍入，因此它产生了可在浮点类型中表示的最佳结果。

Between the multiplication method and the addition method, multiplication is generally preferred.在乘法和加法之间，一般优选乘法。 It is possible for addition to produce a better result.加法可以产生更好的结果。 Presuming IEEE-754 double precision binary, an example is easily constructed as initial_value = -1./3 , i = 3 , and step = 1./3 .假设 IEEE-754 双精度二进制，一个例子很容易构造为initial_value = -1./3 ， i = 3和step = 1./3 。 Then in initial_value + step + step + step , initial_value + step produces exactly zero (so there is no rounding error), adding step has no error, and the second add merely doubles step , which also has no error.然后在initial_value + step + step + step ， initial_value + step产生恰好为零（因此没有舍入误差），添加step没有错误，第二个 add 只是将step加倍，这也没有错误。 So addition produces a final result with no error.所以加法会产生一个没有错误的最终结果。 In contrast, in initial_value + 3*step , 3*step has a rounding error, and it persists through the addition.相比之下，在initial_value + 3*step ， 3*step有一个舍入误差，它在加法过程中一直存在。

However, outside of deliberately constructed examples, multiplication will commonly produce better results than addition, simply because it uses fewer operations, many fewer in most cases.然而，除了故意构造的例子，乘法通常会产生比加法更好的结果，因为它使用的运算更少，在大多数情况下更少。 Typically, the rounding errors in repeated additions will act like a random walk, sometimes increasing the accumulated error and sometimes decreasing it.通常，重复添加中的舍入误差会像随机游走一样，有时会增加累积误差，有时会减少累积误差。 A random walk can sometimes return to the origin but does so rarely.随机游走有时可以返回原点，但很少这样做。 So it is rare that a sequence with many additions will have accumulated error closer to the origin (zero error) than an expression with one multiplication and one addition.因此，与具有一次乘法和一次加法的表达式相比，具有许多加法的序列具有更接近原点的累积误差（零误差）是很少见的。

执行浮点数的 n 次加法还是执行一次整数乘法更好？

问题描述

4 个解决方案

解决方案1
2 2021-10-28 18:07:00

解决方案2
2 已采纳 2021-10-28 21:29:43

解决方案3
1 2021-10-29 00:40:31

解决方案4
1 2021-10-29 10:34:19

执行浮点数的 n 次加法还是执行一次整数乘法更好？

问题描述

4 个解决方案

解决方案1 2 2021-10-28 18:07:00

解决方案2 2 已采纳 2021-10-28 21:29:43

解决方案3 1 2021-10-29 00:40:31

解决方案4 1 2021-10-29 10:34:19

解决方案1
2 2021-10-28 18:07:00

解决方案2
2 已采纳 2021-10-28 21:29:43

解决方案3
1 2021-10-29 00:40:31

解决方案4
1 2021-10-29 10:34:19