从 double 转换为 size_t 会产生错误的结果？

Question

The following code works.以下代码有效。 My question is, should 2) not lead to a result very close to 1)?我的问题是，2）不应该导致非常接近 1）的结果吗？ Why is 2) casted to such a small amount?为什么 2) 铸造的数量如此之少？ Whereby, maybe worth to note 2) is exactly half of 1):因此，也许值得注意的是 2) 正好是 1) 的一半：

std::cout << "1)  " << std::pow(2, 8 * sizeof(size_t)) << std::endl;
std::cout << "2)  " << static_cast<size_t>(std::pow(2, 8 * sizeof(size_t))) << std::endl;

The output is: output 是：

18446744073709551616 18446744073709551616
9223372036854775808 9223372036854775808

Answer 1

It is due to that part of the specification:这是由于规范的那一部分：

7.3.10 Floating-integral conversions [conv.fpint] 7.3.10 浮点整数转换[conv.fpint]

A prvalue of a floating-point type can be converted to a prvalue of an integer type.浮点类型的纯右值可以转换为 integer 类型的纯右值。 The conversion truncates;转换截断； that is, the fractional part is discarded.也就是说，小数部分被丢弃。 The behavior is undefined if the truncated value cannot be represented in the destination type.如果截断的值不能在目标类型中表示，则行为未定义。

The value 18446744073709551616 (that's the truncated part) is larger than std::numberic_limit<size_t>::max() on your system, and due to that, the behavior of that cast is undefined.值18446744073709551616 （即截断部分）大于系统上的std::numberic_limit<size_t>::max() ，因此，该强制转换的行为未定义。

Answer 2

If we want to calculate the amount of different values a certain unsigned integral datatype can represent we can calculate如果我们想计算某个无符号整数数据类型可以表示的不同值的数量，我们可以计算

 std::cout << "1)  " << std::pow(2, 8 * sizeof(size_t)) << std::endl; // yields 18446744073709551616

This calculates 2 to the power of 64 and yields 18446744073709551616. Since sizeof(size_t) is 8 byte, on a 64 bit machine, and a byte has 8 bit, the width of the size_t data type is 64 bit hence 2^64.这将计算 2 的 64 次方并产生 18446744073709551616。由于 sizeof(size_t) 是 8 字节，在 64 位机器上，并且一个字节有 8 位，因此 size_t 数据类型的宽度是 64 位，因此是 2^64。

This is no surprise since usually it is the case that size_t on a system has the width of its underlying hardware bus system since we want to consume no more than one clock cycle to deliver an address or an index of an array or vector.这并不奇怪，因为通常系统上的 size_t 具有其底层硬件总线系统的宽度，因为我们希望消耗不超过一个时钟周期来传递数组或向量的地址或索引。

The above number represents the amount of all different integral values that can be represented by an unsigned integral datatype of 64 bit like size_t or unsigned long long including 0 as one possibility.上面的数字表示可以用 64 位无符号整数数据类型表示的所有不同整数值的数量，如 size_t 或 unsigned long long，包括 0 作为一种可能性。 And since it does include 0, the highest value to be represented is exactly one less, so 18446744073709551615.并且由于它确实包含 0，因此要表示的最高值正好少一，所以 18446744073709551615。

This number can also be retrieved by这个号码也可以通过

 std::cout << std::numeric_limits<size_t>::max() << std::endl; // yields 18446744073709551615
 std::cout << std::numeric_limits<unsigned long long>::max() << std::endl; // yields the same

Now an unsigned datatype stores its values like现在一个无符号数据类型存储它的值，比如

   00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 is 0 
   00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000001 is 1 or 2^0
   00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000010 is 2 or 2^1
   00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000011 is 3 or 2^1+2^0
   00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000100 is 4 or 2^2
   ...
   11111111 11111111 11111111 11111111 11111111 11111111 11111111 11111111 is 18446744073709551615
   and if you want to add another 1, you would need a 65th bit on the left which you dont have:
 1 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 is 0 because 
   there are no more bits on the left.

Any amount higher than the highest possible value you would wish to represent will come down to amount modulo the largest possible value + 1. (amount % (max + 1)) which leads as we can see to zero in above sample.任何高于您希望表示的最高可能值的金额都将归结为以最大可能值 + 1 为模的金额。（金额 % (max + 1)），正如我们在上面的示例中看到的那样，这导致为零。

And since this comes so naturally the standard defines that if you convert any integral datatype signed or unsigned to another unsigned integral datatype it is to be converted amount modulo the largest possible value + 1. Beautiful.由于这很自然，因此标准定义，如果您将任何有符号或无符号整数数据类型转换为另一种无符号整数数据类型，则将转换为最大可能值 + 1 的模数。漂亮。

But this easy rule has a little surprise for us when we wish to convert a negative integral to an unsigned integral like -1 to unsigned long long for eaxample.但是，当我们希望将负积分转换为无符号积分（例如 -1 到 unsigned long long 示例）时，这条简单的规则对我们来说有点意外。 You have a 0 value first and then you deduct 1. What happens is the oposite sequence of the above sample.你首先有一个 0 值，然后你减去 1。发生的是上面示例的相反序列。 Have a look:看一看：

  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 is 0 and now do -1
  11111111 11111111 11111111 11111111 11111111 11111111 11111111 11111111 is 18446744073709551615

So yes, converting -1 to size_t leads to std::numeric_limits<size_t>::max().所以是的，将 -1 转换为 size_t 会导致 std::numeric_limits<size_t>::max()。 Quite unbelievable at first but understandable after some thinking and playing around with it.起初非常令人难以置信，但经过一些思考和玩弄之后可以理解。

Now for our second line of code现在我们的第二行代码

 std::cout << "2)  " << static_cast<size_t>(std::pow(2, 8 * sizeof(size_t))) << std::endl;

we would expect naively 18446744073709551616, the same result as line one, of course.我们会天真地期待 18446744073709551616，当然，结果与第一行相同。

But since we know now about modulo the largest + 1 and we know now that the largest plus one gives 0 we would also, again naively, accept 0 as an answer.但是，既然我们现在知道模最大 + 1 并且我们现在知道最大的加 1 给出 0，我们也会再次天真地接受 0 作为答案。

Why naively?为什么天真？ Because std::pow returns a double and not an integral datatype.因为 std::pow 返回一个 double 而不是整数数据类型。 The double datatype is again 64 bit but internally its representation is entirely different. double 数据类型也是 64 位的，但在内部它的表示完全不同。

 0XXXXXXX XXXX0000 00000000 00000000 00000000 00000000 00000000 00000000

Only those 11 X bits represent the exponent in 2^n form.只有这 11 个 X 位代表 2^n 形式的指数。 That means only those 11 bits have to show 64 and the double will represent 2^64 * 1. So the representation of our big number is much more compact in double than in size_t.这意味着只有这 11 位必须显示 64，而 double 将表示 2^64 * 1。所以我们的大数的表示在 double 中比在 size_t 中紧凑得多。 Would someone want to do modulo the largest plus 1 some more conversion would be needed before to change the representation of 2^64 into a 64 bit line.在将 2^64 的表示更改为 64 位线之前，是否有人想要对最大加 1 进行模数转换。

Some further reading about floating point representation can be found at https://docs.microsoft.com/en-us/cpp/build/ieee-floating-point-representation?view=msvc-160 for example.例如，可以在https://docs.microsoft.com/en-us/cpp/build/ieee-floating-point-representation?view=msvc-160中找到有关浮点表示的一些进一步阅读。

And the standard says that if you convert a floating value to an integral which cannot be represented by the target integral datatype the result is UB, undefined behaviour.标准规定，如果将浮点值转换为目标整数数据类型无法表示的整数，则结果为 UB，即未定义的行为。

See the C++17 Standard ISO/IEC14882: 7.10 Floating-integral conversions [conv.fpint]请参阅 C++17 标准 ISO/IEC14882：7.10 浮点积分转换 [conv.fpint]

A prvalue of a floating-point type can be converted to a prvalue of an integer type.浮点类型的纯右值可以转换为 integer 类型的纯右值。 The conversion truncates;转换截断； that is, the fractional part is discarded.也就是说，小数部分被丢弃。 The behavior is undefined if the truncated value cannot be represented in the destination type.如果截断的值不能在目标类型中表示，则行为未定义。 ... ...

So double can easily hold 2^64 and thats the reason why line 1 could print out so easily.所以 double 可以轻松容纳 2^64，这就是为什么第 1 行可以如此轻松地打印出来的原因。 But it is 1 too much to be represented in size_t so the result is UB.但是在 size_t 中表示太多了，所以结果是 UB。 So whatever is the outcome of our line 2 is simply irrelevant because it is UB.所以无论我们第 2 行的结果是什么，都是无关紧要的，因为它是 UB。

Ok, but if any random result will do, how come the UB outcome is exactly half?好的，但是如果任何随机结果都可以，为什么 UB 结果正好是一半？ Well fist of all, the outcome is from MSVC.首先，结果来自 MSVC。 Clang or other compiler may deliver any other UB result. Clang 或其他编译器可能会提供任何其他 UB 结果。

But lets look at the "half" outcome since it is easy.但是让我们看看“一半”的结果，因为它很容易。

   Trying to add 1 to the largest  
   11111111 11111111 11111111 11111111 11111111 11111111 11111111 11111111 is 18446744073709551615
   would if only integrals would be involved lead to, 
 1 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
   but thats not possible since the bit does not exist and it is not integral but double datatype and 
   hence UB, so accidentially the result is
   10000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 which is 9223372036854775808
   so exactly half of the naively expected or 2^63.

从 double 转换为 size_t 会产生错误的结果？

问题描述

2 个解决方案

解决方案1
11 已采纳 2020-11-25 21:26:26

解决方案2
0 2020-12-07 21:40:41

从 double 转换为 size_t 会产生错误的结果？

问题描述

2 个解决方案

解决方案1 11 已采纳 2020-11-25 21:26:26

解决方案2 0 2020-12-07 21:40:41

解决方案1
11 已采纳 2020-11-25 21:26:26

解决方案2
0 2020-12-07 21:40:41