简体   繁体   English

为什么 uint64_t 不能正确显示 pow(2, 64) - 1?

[英]Why uint64_t cannot show pow(2, 64) - 1 properly?

I'm trying to understand why uint64_t type can not show pow(2,64)-1 properly.我试图理解为什么uint64_t类型不能正确显示pow(2,64)-1 The cplusplus standard is 199711L. cplusplus 标准是 199711L。

I checked the pow() function under C++98 standard which is我检查了 C++98 标准下的pow()函数,它是

double pow (double base     , double exponent);
float pow (float base      , float exponent);
long double pow (long double base, long double exponent);
double pow (double base     , int exponent);
long double pow (long double base, int exponent);

So I wrote the following snippet所以我写了以下片段

double max1 = (pow(2, 64) - 1);
cout << max1 << endl;

uint64_t max2 = (pow(2, 64) - 1);
cout << max2 << endl;

uint64_t max3 = -1;
cout << max3 << endl;

The outputs are:输出是:

max1: 1.84467e+019
max2: 9223372036854775808
max3: 18446744073709551615

Floating point numbers have finite precision.浮点数的精度是有限的。

On your system (and typically, assuming binary64 IEEE-754 format) 18446744073709551615 is not a number that has a representation in the double format.在您的系统上(通常假设是 binary64 IEEE-754 格式) 18446744073709551615不是以double格式表示的数字。 The closest number that does have a representation happens to be 18446744073709551616 .有表示的最接近的数字恰好是18446744073709551616

Subtracting (and adding) together two floating point numbers of wildly different magnitudes usually produces an error.将两个大小相差很大的浮点数相减(和相加)通常会产生错误。 This error can be significant in relation to the smaller operand.对于较小的操作数,此错误可能很大。 In the case of 18446744073709551616. - 1. -> 18446744073709551616. the error of the subtraction is 1, which is in fact the same value as the smaller operand.18446744073709551616. - 1. -> 18446744073709551616.的情况下,减法的误差为 1,实际上与较小的操作数相同。

When a floating point value is converted to an integer type, and the value cannot fit in the integer type, the behaviour of the program is undefined - even when the integer type is unsigned.当浮点值被转换为整数类型,并且该值不能适合整数类型时,程序的行为是未定义的 - 即使整数类型是无符号的。

TL;DR : It's not that uint64_t type cannot show pow(2,64)-1 properly but the reverse: double can't store precisely 2 64 - 1 due to the lack of significand bits. TL;DR不是uint64_t类型不能正确显示pow(2,64)-1而是相反:由于缺少有效位, double不能精确存储 2 64 - 1 You can only do that with types with 64 bits of precision or more (like long double on many platforms).您只能使用 64 位或更高精度的类型(如许多平台上的long double )来执行此操作。 Try std::pow(2.0L, 64) - 1.0L (note the L suffix) or powl(2.0L, 64) - 1.0L;尝试std::pow(2.0L, 64) - 1.0L (注意L后缀)或powl(2.0L, 64) - 1.0L; and see看看

Anyway you shouldn't use a floating-point type for integer math right from the beginning.无论如何,您不应该从一开始就将浮点类型用于整数数学 Not only it's far slower to calculate pow(2, x) than 1ULL << x , it'll also cause the issue you saw due to the limited precision of double .不仅计算pow(2, x)1ULL << x慢得多,而且由于double精度有限,它还会导致您看到的问题。 Use uint64_t max2 = -1 instead, or ((unsigned __int128)1ULL << 64) - 1 if the compiler supports __int128使用uint64_t max2 = -1代替,或者((unsigned __int128)1ULL << 64) - 1如果编译器支持__int128


pow(2, 64) - 1 is a double expression , not int , as pow doesn't have any overload that returns an integral type. pow(2, 64) - 1double表达式而不是int ,因为pow没有任何返回整数类型的重载。 The integer 1 will be promoted to the same rank as the result of pow整数1将被提升到与pow的结果相同的等级

However because IEEE-754 double precision is only 64-bit long, you can never store values that have 64 significant bits or more like 2 64 -1但是,由于 IEEE-754 双精度只有 64 位长,因此您永远无法存储具有 64 位或更多有效位的值,例如 2 64 -1

So pow(2, 64) - 1 will be rounded to the closest representable value , which is pow(2, 64) itself, and pow(2, 64) - 1 == pow(2, 64) will result in 1. The largest value that's smaller than it is 18446744073709549568 = 2 64 - 2048. You can check that with std::nextafter因此pow(2, 64) - 1将四舍五入到最接近的可表示值,即pow(2, 64)本身,而pow(2, 64) - 1 == pow(2, 64)将导致 1。小于它的最大值是 18446744073709549568 = 2 64 - 2048。您可以使用std::nextafter检查

On some platforms (notably x86, except on MSVC) long double does have 64 bits of significand , so you'll get the correct value in that case.在某些平台(特别是 x86,MSVC 除外)上, long double确实有64 位的 significand ,因此在这种情况下您将获得正确的值。 The following snippet以下片段

double max1 = pow(2, 64) - 1;
std::cout << "pow(2, 64) - 1 = " << std::fixed << max1 << '\n';
std::cout << "Previous representable value: " << std::nextafter(max1, 0) << '\n';
std::cout << (pow(2, 64) - 1 == pow(2, 64)) << '\n';

long double max2 = pow(2.0L, 64) - 1.0L;
std::cout << std::fixed << max2 << '\n';

prints out打印出来

pow(2, 64) - 1 = 18446744073709551616.000000
Previous representable value: 18446744073709549568.000000
1
18446744073709551615.000000

You can clearly see long double can store the correct value as expected您可以清楚地看到long double可以按预期存储正确的值

On many other platforms double may be IEEE-754 quadruple-precision or double-double .在许多其他平台上double可能是IEEE-754 四倍精度double-double Both have more than 64 bits of significand so you can do the same thing.两者都有超过 64 位的有效数,所以你可以做同样的事情。 But of course the overhead will be higher但当然开销会更高

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM