[英]Why uint64_t cannot show pow(2, 64) - 1 properly?
I'm trying to understand why uint64_t
type can not show pow(2,64)-1
properly.我试图理解为什么
uint64_t
类型不能正确显示pow(2,64)-1
。 The cplusplus standard is 199711L. cplusplus 标准是 199711L。
I checked the pow()
function under C++98 standard which is我检查了 C++98 标准下的
pow()
函数,它是
double pow (double base , double exponent);
float pow (float base , float exponent);
long double pow (long double base, long double exponent);
double pow (double base , int exponent);
long double pow (long double base, int exponent);
So I wrote the following snippet所以我写了以下片段
double max1 = (pow(2, 64) - 1);
cout << max1 << endl;
uint64_t max2 = (pow(2, 64) - 1);
cout << max2 << endl;
uint64_t max3 = -1;
cout << max3 << endl;
The outputs are:输出是:
max1: 1.84467e+019
max2: 9223372036854775808
max3: 18446744073709551615
Floating point numbers have finite precision.浮点数的精度是有限的。
On your system (and typically, assuming binary64 IEEE-754 format) 18446744073709551615
is not a number that has a representation in the double
format.在您的系统上(通常假设是 binary64 IEEE-754 格式)
18446744073709551615
不是以double
格式表示的数字。 The closest number that does have a representation happens to be 18446744073709551616
.有表示的最接近的数字恰好是
18446744073709551616
。
Subtracting (and adding) together two floating point numbers of wildly different magnitudes usually produces an error.将两个大小相差很大的浮点数相减(和相加)通常会产生错误。 This error can be significant in relation to the smaller operand.
对于较小的操作数,此错误可能很大。 In the case of
18446744073709551616. - 1. -> 18446744073709551616.
the error of the subtraction is 1, which is in fact the same value as the smaller operand.在
18446744073709551616. - 1. -> 18446744073709551616.
的情况下,减法的误差为 1,实际上与较小的操作数相同。
When a floating point value is converted to an integer type, and the value cannot fit in the integer type, the behaviour of the program is undefined - even when the integer type is unsigned.当浮点值被转换为整数类型,并且该值不能适合整数类型时,程序的行为是未定义的 - 即使整数类型是无符号的。
TL;DR : It's not that uint64_t
type cannot show pow(2,64)-1
properly but the reverse: double
can't store precisely 2 64 - 1 due to the lack of significand bits. TL;DR :不是
uint64_t
类型不能正确显示pow(2,64)-1
而是相反:由于缺少有效位, double
不能精确存储 2 64 - 1 。 You can only do that with types with 64 bits of precision or more (like long double
on many platforms).您只能使用 64 位或更高精度的类型(如许多平台上的
long double
)来执行此操作。 Try std::pow(2.0L, 64) - 1.0L
(note the L
suffix) or powl(2.0L, 64) - 1.0L;
尝试
std::pow(2.0L, 64) - 1.0L
(注意L
后缀)或powl(2.0L, 64) - 1.0L;
and see看看
Anyway you shouldn't use a floating-point type for integer math right from the beginning.无论如何,您不应该从一开始就将浮点类型用于整数数学。 Not only it's far slower to calculate
pow(2, x)
than 1ULL << x
, it'll also cause the issue you saw due to the limited precision of double
.不仅计算
pow(2, x)
比1ULL << x
慢得多,而且由于double
精度有限,它还会导致您看到的问题。 Use uint64_t max2 = -1
instead, or ((unsigned __int128)1ULL << 64) - 1
if the compiler supports __int128
使用
uint64_t max2 = -1
代替,或者((unsigned __int128)1ULL << 64) - 1
如果编译器支持__int128
pow(2, 64) - 1
is a double
expression , not int
, as pow
doesn't have any overload that returns an integral type. pow(2, 64) - 1
是double
表达式,而不是int
,因为pow
没有任何返回整数类型的重载。 The integer 1
will be promoted to the same rank as the result of pow
整数
1
将被提升到与pow
的结果相同的等级
However because IEEE-754 double precision is only 64-bit long, you can never store values that have 64 significant bits or more like 2 64 -1但是,由于 IEEE-754 双精度只有 64 位长,因此您永远无法存储具有 64 位或更多有效位的值,例如 2 64 -1
So pow(2, 64) - 1
will be rounded to the closest representable value , which is pow(2, 64)
itself, and pow(2, 64) - 1 == pow(2, 64)
will result in 1. The largest value that's smaller than it is 18446744073709549568 = 2 64 - 2048. You can check that with std::nextafter
因此
pow(2, 64) - 1
将四舍五入到最接近的可表示值,即pow(2, 64)
本身,而pow(2, 64) - 1 == pow(2, 64)
将导致 1。小于它的最大值是 18446744073709549568 = 2 64 - 2048。您可以使用std::nextafter
检查
On some platforms (notably x86, except on MSVC) long double
does have 64 bits of significand , so you'll get the correct value in that case.在某些平台(特别是 x86,MSVC 除外)上,
long double
确实有64 位的 significand ,因此在这种情况下您将获得正确的值。 The following snippet以下片段
double max1 = pow(2, 64) - 1;
std::cout << "pow(2, 64) - 1 = " << std::fixed << max1 << '\n';
std::cout << "Previous representable value: " << std::nextafter(max1, 0) << '\n';
std::cout << (pow(2, 64) - 1 == pow(2, 64)) << '\n';
long double max2 = pow(2.0L, 64) - 1.0L;
std::cout << std::fixed << max2 << '\n';
prints out打印出来
pow(2, 64) - 1 = 18446744073709551616.000000
Previous representable value: 18446744073709549568.000000
1
18446744073709551615.000000
You can clearly see long double
can store the correct value as expected您可以清楚地看到
long double
可以按预期存储正确的值
On many other platforms double
may be IEEE-754 quadruple-precision or double-double .在许多其他平台上
double
可能是IEEE-754 四倍精度或double-double 。 Both have more than 64 bits of significand so you can do the same thing.两者都有超过 64 位的有效数,所以你可以做同样的事情。 But of course the overhead will be higher
但当然开销会更高
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.