浮点加法和乘法是结合的吗？

Question

I had a problem when I was adding three floating point values and comparing them to 1.我在添加三个浮点值并将它们与 1 进行比较时遇到了问题。

cout << ((0.7 + 0.2 + 0.1)==1)<<endl;     //output is 0
cout << ((0.7 + 0.1 + 0.2)==1)<<endl;     //output is 1

Why would these values come out different?为什么这些值会不同？

Answer 1

Floating point addition is not necessarily associative.浮点加法不一定是结合的。 If you change the order in which you add things up, this can change the result.如果您更改添加内容的顺序，这可能会改变结果。

The standard paper on the subject is What Every Computer Scientist Should Know about Floating Point Arithmetic .关于这个主题的标准论文是每个计算机科学家应该知道的关于浮点运算的知识。 It gives the following example:它给出了以下示例：

Another grey area concerns the interpretation of parentheses.另一个灰色区域涉及括号的解释。 Due to roundoff errors, the associative laws of algebra do not necessarily hold for floating-point numbers.由于舍入误差，代数的结合律不一定适用于浮点数。 For example, the expression (x+y)+z has a totally different answer than x+(y+z) when x = 1e30, y = -1e30 and z = 1 (it is 1 in the former case, 0 in the latter).例如，当 x = 1e30、y = -1e30 和 z = 1 时，表达式 (x+y)+z 与 x+(y+z) 的答案完全不同（前者为 1，后者为 0 ）。

Answer 2

What is likely, with currently popular machines and software, is:使用当前流行的机器和软件，可能的是：

The compiler encoded .7 as 0x1.6666666666666p-1 (this is the hexadecimal numeral 1.6666666666666 multiplied by 2 to the power of -1), .2 as 0x1.999999999999ap-3, and .1 as 0x1.999999999999ap-4.编译器编码的.7作为0x1.6666666666666p-1（这是乘以2至-1的功率十六进制数字1.6666666666666）， .2作为0x1.999999999999ap-3和.1作为0x1.999999999999ap-4。 Each of these is the number representable in floating-point that is closest to the decimal numeral you wrote.其中每一个都是可表示为最接近您所写的十进制数字的浮点数。

Observe that each of these hexadecimal floating-point constants has exactly 53 bits in its significand (the "fraction" part, often inaccurately called the mantissa).观察到这些十六进制浮点常量中的每一个在其有效数（“分数”部分，通常不准确地称为尾数）中都恰好有 53 位。 The hexadecimal numeral for the significand has a "1" and thirteen more hexadecimal digits (four bits each, 52 total, 53 including the "1"), which is what the IEEE-754 standard provides for, for 64-bit binary floating-point numbers.有效数的十六进制数字有一个“1”和另外 13 个十六进制数字（每个 4 位，总共 52 位，包括“1”在内的 53 位），这是 IEEE-754 标准规定的，用于 64 位二进制浮点数。点数。

Let's add the numbers for .7 and .2 : 0x1.6666666666666p-1 and 0x1.999999999999ap-3.让我们添加.7和.2的数字：0x1.6666666666666p-1 和 0x1.999999999999ap-3。 First, scale the exponent of the second number to match the first.首先，缩放第二个数字的指数以匹配第一个数字。 To do this, we will multiply the exponent by 4 (changing "p-3" to "p-1") and multiply the significand by 1/4, giving 0x0.66666666666668p-1.为此，我们将指数乘以 4（将“p-3”更改为“p-1”）并将有效数乘以 1/4，得到 0x0.666666666666668p-1。 Then add 0x1.6666666666666p-1 and 0x0.66666666666668p-1, giving 0x1.ccccccccccccc8p-1.然后添加 0x1.6666666666666p-1 和 0x0.66666666666668p-1，得到 0x1.ccccccccccccc8p-1。 Note that this number has more than 53 bits in the significand: The "8" is the 14th digit after the period.请注意，此数字的有效数位超过 53 位：“8”是句点后的第 14 位。 Floating-point cannot return a result with this many bits, so it has to be rounded to the nearest representable number.浮点不能返回这么多位的结果，所以它必须四舍五入到最接近的可表示数。 In this case, there are two numbers that are equally near, 0x1.cccccccccccccp-1 and 0x1.ccccccccccccdp-1.在这种情况下，有两个同样接近的数字，0x1.cccccccccccccp-1 和 0x1.ccccccccccccdp-1。 When there is a tie, the number with a zero in the lowest bit of the significand is used.当出现平局时，使用在有效数的最低位中带有零的数字。 "c" is even and "d" is odd, so "c" is used. “c”是偶数，“d”是奇数，所以使用“c”。 The final result of the addition is 0x1.cccccccccccccp-1.添加的最终结果是0x1.cccccccccccccp-1。

Next, add the number for .1 (0x1.999999999999ap-4) to that.接下来，将.1 (0x1.999999999999ap-4) 的数字添加到其中。 Again, we scale to make the exponents match, so 0x1.999999999999ap-4 becomes 0x.33333333333334p-1.同样，我们缩放以使指数匹配，因此 0x1.999999999999ap-4 变为 0x.33333333333334p-1。 Then add that to 0x1.cccccccccccccp-1, giving 0x1.fffffffffffff4p-1.然后将其添加到 0x1.cccccccccccccp-1，给出 0x1.ffffffffffffff4p-1。 Rounding that to 53 bits gives 0x1.fffffffffffffp-1, and that is the final result of .7+.2+.1 .将其四舍五入为 53 位给出 0x1.fffffffffffffp-1，这是.7+.2+.1的最终结果。

Now consider .7+.1+.2 .现在考虑.7+.1+.2 。 For .7+.1 , add 0x1.6666666666666p-1 and 0x1.999999999999ap-4.对于.7+.1 ，添加 0x1.6666666666666p-1 和 0x1.999999999999ap-4。 Recall the latter is scaled to 0x.33333333333334p-1.回想一下，后者被缩放到 0x.33333333333334p-1。 Then the exact sum is 0x1.99999999999994p-1.那么确切的总和是 0x1.99999999999994p-1。 Rounding that to 53 bits gives 0x1.9999999999999p-1.将其四舍五入为 53 位给出 0x1.9999999999999p-1。

Then add the number for .2 (0x1.999999999999ap-3), which is scaled to 0x0.66666666666668p-1.然后添加.2 (0x1.999999999999ap-3) 的数字，将其缩放为 0x0.66666666666668p-1。 The exact sum is 0x2.00000000000008p-1.确切的总和是 0x2.00000000000008p-1。 Floating-point significands are always scaled to start with 1 (except for special cases: zero, infinity, and very small numbers at the bottom of the representable range), so we adjust this to 0x1.00000000000004p0.浮点有效数总是从 1 开始缩放（特殊情况除外：零、无穷大和可表示范围底部的非常小的数字），因此我们将其调整为 0x1.00000000000004p0。 Finally, we round to 53 bits, giving 0x1.0000000000000p0.最后，我们四舍五入到 53 位，得到 0x1.0000000000000p0。

Thus, because of errors that occur when rounding, .7+.2+.1 returns 0x1.fffffffffffffp-1 (very slightly less than 1), and .7+.1+.2 returns 0x1.0000000000000p0 (exactly 1).因此，由于舍入时发生错误， .7+.2+.1返回 0x1.fffffffffffffp-1（非常略小于 1），而.7+.1+.2返回 0x1.0000000000000p0（正好是 1）。

Answer 3

Floating point multiplication is not associative in C or C++.浮点乘法在 C 或 C++ 中不是关联的。

Proof:证明：

#include<stdio.h>
#include<time.h>
#include<stdlib.h>
using namespace std;
int main() {
    int counter = 0;
    srand(time(NULL));
    while(counter++ < 10){
        float a = rand() / 100000;
        float b = rand() / 100000;
        float c = rand() / 100000;

        if (a*(b*c) != (a*b)*c){
            printf("Not equal\n");
        }
    }
    printf("DONE");
    return 0;
}

In this program, about 30% of the time, (a*b)*c is not equal to a*(b*c) .在这个程序中，大约 30% 的时间(a*b)*c不等于a*(b*c) 。

Answer 4

Similar answer to Eric's, but for addition, and with Python.与 Eric 的类似答案，但用于添加和使用 Python。

import random

random.seed(0)
n = 1000
a = [random.random() for i in range(n)]
b = [random.random() for i in range(n)]
c = [random.random() for i in range(n)]

sum(1 if (a[i] + b[i]) + c[i] != a[i] + (b[i] + c[i]) else 0 for i in range(n))

Answer 5

Neither addition nor multiplication is associative with IEEE 743 double precision (64-bit) numbers.加法和乘法都不与 IEEE 743 双精度（64 位）数相关联。 Here are examples for each (evaluated with Python 3.9.7):以下是每个示例（使用 Python 3.9.7 评估）：

>>> (.1 + .2) + .3
0.6000000000000001
>>> .1 + (.2 + .3)
0.6

>>> (.1 * .2) * .3
0.006000000000000001
>>> .1 * (.2 * .3)
0.006

浮点加法和乘法是结合的吗？

问题描述

5 个解决方案

解决方案1
28 已采纳 2012-04-29 11:50:34

解决方案2
8 2012-05-01 15:21:18

解决方案3
3 2014-06-22 20:40:19

解决方案4
0 2021-08-04 03:01:47

解决方案5
0 2021-10-28 12:40:39

浮点加法和乘法是结合的吗？

问题描述

5 个解决方案

解决方案1 28 已采纳 2012-04-29 11:50:34

解决方案2 8 2012-05-01 15:21:18

解决方案3 3 2014-06-22 20:40:19

解决方案4 0 2021-08-04 03:01:47

解决方案5 0 2021-10-28 12:40:39

解决方案1
28 已采纳 2012-04-29 11:50:34

解决方案2
8 2012-05-01 15:21:18

解决方案3
3 2014-06-22 20:40:19

解决方案4
0 2021-08-04 03:01:47

解决方案5
0 2021-10-28 12:40:39