简体   繁体   English

IEEE 754浮点加法和乘法的互换性

[英]Interchangeability of IEEE 754 floating-point addition and multiplication

Is the addition x + x interchangeable by the multiplication 2 * x in IEEE 754 (IEC 559) floating-point standard , or more generally speaking is there any guarantee that case_add and case_mul always give exactly the same result? 加法x + x可以通过IEEE 754(IEC 559)浮点标准中的乘法2 * x x + x互换,或者更一般地说是否保证case_addcase_mul 总是给出完全相同的结果?

#include <limits>

template <typename T>
T case_add(T x, size_t n)
{
    static_assert(std::numeric_limits<T>::is_iec559, "invalid type");

    T result(x);

    for (size_t i = 1; i < n; ++i)
    {
        result += x;
    }

    return result;
}

template <typename T>
T case_mul(T x, size_t n)
{
    static_assert(std::numeric_limits<T>::is_iec559, "invalid type");

    return x * static_cast<T>(n);
}

Is the addition x + x interchangeable by the multiplication 2 * x in IEEE 754 (IEC 559) floating-point standard 加法x + x可通过IEEE 754(IEC 559)浮点标准中的乘法2 * x x + x互换

Yes, since they are both mathematically identical, they will give the same result (since the result is exact in floating point). 是的,因为它们在数学上都是相同的,所以它们将给出相同的结果(因为结果在浮点中是精确的)。

or more generally speaking is there any guarantee that case_add and case_mul always give exactly the same result? 或者更一般地说,是否有任何保证case_add和case_mul始终给出完全相同的结果?

Not generally, no. 不一般,没有。 From what I can tell, it seems to hold for n <= 5 : 据我所知,它似乎适用于n <= 5

  • n=3 : as x+x is exact (ie involves no rounding), so (x+x)+x only involves one rounding at the final step. n=3 :由于x+x是精确的(即不涉及舍入),因此(x+x)+x仅涉及最后一步的一次舍入。
  • n=4 (and you're using the default rounding mode) then n=4 (然后你正在使用默认的舍入模式)

    • if the last bit of x is 0, then x+x+x is exact, and so the results are equal by the same argument as n=3 . 如果x的最后一位是0,则x+x+x是精确的,因此结果与n=3参数相同。
    • if the last 2 bits are 01 , then the exact value of x+x+x will have last 2 bits of 1|1 (where | indicates the final bit in the format), which will be rounded up to 0|0 . 如果最后2位是01 ,则x+x+x的精确值将具有1|1最后2位(其中|表示格式中的最后一位),其将向上舍入为0|0 The next addition will give an exact result |01 , so the result will be rounded down, cancelling out the previous error. 下一个添加将给出精确的结果|01 ,因此结果将向下舍入,取消先前的错误。
    • if the last 2 bits are 11 , then the exact value of x+x+x will have last 2 bits of 0|1 , which will be rounded down to 0|0 . 如果最后2位是11 ,那么x+x+x的精确值将具有0|1最后2位,其将向下舍入为0|0 The next addition will give an exact result |11 , so the result will be rounded up, again cancelling out the previous error. 下一个加法将给出精确的结果|11 ,因此结果将向上舍入,再次取消先前的错误。
  • n=5 (again, assuming default rounding): since x+x+x+x is exact, it holds for the same reason as n=3 . n=5 (同样,假设默认舍入):由于x+x+x+x是精确的,因此它与n=3原因相同。

For n=6 it fails, eg take x to be 1.0000000000000002 (the next double after 1.0 ), in which case 6x is 6.000000000000002 and x+x+x+x+x+x is 6.000000000000001 对于n=6它失败,例如,取x1.00000000000000021.0之后的下一个double 6.000000000000002 ),在这种情况下, 6x6.000000000000002x+x+x+x+x+x6.000000000000001

如果n是例如pow(2, 54)则乘法将正常工作,但是在加法路径中,一旦结果值足够大于输入xresult += x将产生result

Yes, but it doesn't hold generally. 是的,但它并不普遍。 Multiplication by a number higher than 2 might not give the same results, as you have changed the exponent and can drop a bit if you replace with adds. 乘以高于2的数字可能不会给出相同的结果,因为您更改了指数,如果替换为adds,则可能会略微下降。 Multiplication by two can't drop a bit if replaced by add operations, however. 但是,如果由添加操作替换,乘以2则不会丢失一点。

If the accumulator result in case_add becomes too large, adding x will introduce rounding errors. 如果case_add的累加器result变得太大,则添加x将引入舍入误差。 At a certain point, adding x won't have an effect at all. 在某个时刻,添加x根本不会产生任何影响。 So the functions won't give the same result. 所以函数不会给出相同的结果。

For example if double x = 0x1.0000000000001p0 (hexadecimal float notation): 例如,如果double x = 0x1.0000000000001p0 (十六进制浮点表示法):

n  case_add              case_mul

1  0x1.0000000000001p+0  0x1.0000000000001p+0
2  0x1.0000000000001p+1  0x1.0000000000001p+1
3  0x1.8000000000002p+1  0x1.8000000000002p+1
4  0x1.0000000000001p+2  0x1.0000000000001p+2
5  0x1.4000000000001p+2  0x1.4000000000001p+2
6  0x1.8000000000001p+2  0x1.8000000000002p+2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 IEEE 754浮点除法的可逆性 - Invertability of IEEE 754 floating-point division 浮点加法和乘法是结合的吗? - Is floating-point addition and multiplication associative? 浮点运算是否导致IEC 559 / IEEE 754浮点类型的无限未定义行为 - Are floating point operations resulting in infinity undefined behavior for IEC 559/IEEE 754 floating-point types 返回浮点类型是否完全符合 IEEE-754 的函数? - Function that returns whether the floating-point type is fully compliant to IEEE-754? 跨编译器的 IEEE-754 二进制模式一致性的浮点文字 - Floating-point literal to IEEE-754 binary pattern consistency across compilers 非IEEE754浮点类型的大小如何受到约束? - How are the sizes of non IEEE754 floating-point types constrainted? IEEE Std 754浮点:让t:= a - b,标准保证a == b + t吗? - IEEE Std 754 Floating-Point: let t := a - b, does the standard guarantee that a == b + t? 将IEEE-754之前的C ++浮点数与C#相互转换 - Convert pre-IEEE-754 C++ floating-point numbers to/from C# IEEE 754 两个 32 位浮点数的加法(-1 和 2^(-50)) - IEEE 754 Addition of two 32-bit floating point numbers (-1 and 2^(-50) ) IEEE 754浮点数,最大数&lt;1? - IEEE 754 floating point, what is the largest number < 1?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM