简体繁体 English

当64位机器上的32位整数溢出时会发生什么？

[英]What happens exactly when a 32bit integer overflows on a 64bit machine?

原文 2014-03-28 13:52:05 3 3 c++/ c/ 32bit-64bit

The situation is the following: 情况如下：

a 32bit integer overflows 32位整数溢出
malloc, which is expecting a 64bit integer uses this integer as input malloc，期望 64位整数使用此整数作为输入

Now on a 64bit machine, which statement is correct (if any at all) : 现在在64位机器上，哪个语句是正确的（如果有的话） ：

Say that the signed binary integer 11111111001101100000101011001000 is simply negative due to an overflow. 假设由于溢出，带符号的二进制整数11111111001101100000101011001000只是负数。 This is a practical existing problem since you might want to allocate more bytes than you can describe in a 32bit integer. 这是一个实际存在的问题，因为您可能希望分配比在32位整数中描述的更多的字节。 But then it gets read in as a 64bit integer. 但后来它被读入64位整数。

Malloc reads this as a 64bit integer, finding 11111111001101100000101011001000################################ with # being a wildcard bit representing whatever data is stored after the original integer. Malloc将其读取为64位整数，找到11111111001101100000101011001000################################ ##是一个表示任何通配符的通配符数据存储在原始整数之后。 In other words, it read a result close to its maximum value 2^64 and tries to allocate some quintillion bytes. 换句话说，它读取接近其最大值2 ^ 64的结果并尝试分配一些quintillion字节。 It fails. 它失败。
Malloc reads this as a 64bit integer, casting to 0000000000000000000000000000000011111111001101100000101011001000 , possibly because it is how it is loaded into a register leaving a lot of bits zero. Malloc将其读取为64位整数，转换为0000000000000000000000000000000011111111001101100000101011001000 ，这可能是因为它是如何加载到寄存器中而使大量位为零。 It does not fail but allocates the negative memory as if reading a positive unsigned value. 它不会失败，但会分配负内存，就像读取正无符号值一样。
Malloc reads this as a 64bit integer, casting to ################################11111111001101100000101011001000 , possibly because it is how it is loaded into a register with # a wildcard representing whatever data was previously in the register. Malloc将其读取为64位整数，转换为################################11111111001101100000101011001000 ，可能是因为它是怎么回事加载到一个寄存器中，带有一个表示寄存器中先前数据的通配符。 It fails quite unpredictably depending on the last value. 根据最后一个值，它无法完全失败。
The integer does not overflow at all because even though it is 32bit, it is still in a 64bit register and therefore malloc works fine. 整数根本不会溢出，因为即使它是32位，它仍然是64位寄存器，因此malloc工作正常。

I actually tested this, resulting in the malloc failing (which would imply either 1 or 3 to be correct). 我实际测试了这个，导致malloc失败（这意味着1或3是正确的）。 I assume 1 is the most logical answer. 我认为1是最合乎逻辑的答案。 I also know the fix (using size_t as input instead of int). 我也知道修复（使用size_t作为输入而不是int）。

I'd just really want to know what actually happens. 我真的想知道究竟发生了什么。 For some reason I don't find any clarification on how 32bit integers are actually treated on 64bit machines for such an unexpected 'cast'. 出于某种原因，我没有找到任何关于如何在64位机器上实际处理32位整数以进行这种意外“演员”的澄清。 I'm not even sure if it being in a register actually matters. 我甚至不确定它在寄存器中是否真的很重要。

3 个解决方案

The problem with your reasoning, is that it starts with the assumption that the integer overflow will result in a deterministic and predictable operation. 你的推理的问题在于，它假设整数溢出将导致确定性和可预测的操作。

This, unfortunately, is not the case: undefined behavior means that anything can happen, and notably that compilers may optimize as if it could never happen . 不幸的是，情况并非如此： 未定义的行为意味着任何事情都可能发生，特别是编译器可能会优化，好像它永远不会发生 。

As a result, it is nigh impossible to predict what kind of program the compiler will produce if there is such a possible overflow. 因此，如果存在可能的溢出，则几乎不可能预测编译器将生成什么类型的程序。

A possible output is that the compiler elides the allocation because it cannot happen 可能的输出是编译器省略了分配，因为它不会发生
A possible output is that the resulting value is 0-extended or sign-extended (depending on whether it's known to be positive or not) and interpreted as an unsigned integer. 可能的输出是结果值是0扩展或符号扩展（取决于它是否已知为正）并解释为无符号整数。 You may get anything from 0 to size_t(-1) and thus may allocate either too few or too much memory, or even fail to allocate, ... 你可能得到0到size_t(-1)任何东西，因此可能分配太少或太多的内存，甚至无法分配，......
... ...

Undefined Behavior => All Bets Are Off 未定义的行为=>所有投注均已关闭

Once an integer overflows, using its value results in undefined behavior. 一旦整数溢出，使用其值会导致未定义的行为。 A program that uses the result of an int after the overflow is invalid according to the standard -- essentially, all bets about its behavior are off. 根据标准，在溢出后使用int结果的程序是无效的 - 基本上，关于其行为的所有赌注都是关闭的。

With this in mind, let's look at what's going to happen on a computer where negative numbers are stored in two's complement representation. 考虑到这一点，让我们看一下在负数存储在二进制补码表示中的计算机上会发生什么。 When you add two large 32-bit integers on such a computer, you get a negative result in case of an overflow. 在这样的计算机上添加两个大的32位整数时，如果出现溢出，则会得到否定结果。

However, according to C++ standard, the type of malloc 's argument, ie size_t , is always unsigned . 但是，根据C ++标准， malloc的参数类型（即size_t ）始终是无符号的。 When you convert a negative number to an unsigned number, it gets sign-extended ( see this answer for a discussion and a reference to the standard ), meaning that the most significant bit of the original (which is 1 for all negative numbers) is set in the top 32 bits of the unsigned result. 当您将负数转换为无符号数时，它会进行符号扩展（请参阅此答案以进行讨论并参考标准），这意味着原始的最高位（所有负数为1 ）是设置在无符号结果的前32位。

Therefore, what you get is a modified version of your third case, except that instead of "wildcard bit # " it has ones all the way to the top. 因此，你得到的是你的第三种情况的修改版本，除了它不是“通配符# ”而是一直到顶部。 The result is a gigantic unsigned number (roughly 16 exbibytes or so); 结果是一个巨大的无符号数（大约16个exbibytes左右）; naturally malloc fails to allocate that much memory. 很自然malloc无法分配那么多内存。

So if we have a specific code example, a specific compiler and platform we can probably determine what the compiler is doing. 因此，如果我们有一个特定的代码示例，一个特定的编译器和平台，我们可以确定编译器正在做什么。 Which is the approach taken in Deep C but even then it may not be fully predictable which is a hallmark of undefined behavior, generalizing about undefined behavior is not a good idea. 这是Deep C采用的方法，但即便如此，它可能无法完全预测，这是未定义行为的标志，对未定义行为的概括并不是一个好主意。

We only have to take a look at the advice from the gcc documentation to see how messy it can get. 我们只需要看一下gcc文档中的建议，看看它有多乱。 The documentation offers some good advice on integer overflow , which says: 该文档提供了一些关于整数溢出的好建议，其中说：

In practice many portable C programs assume that signed integer overflow wraps around reliably using two's complement arithmetic. 在实践中，许多可移植的C程序假设有符号整数溢出使用二进制补码算法可靠地包装。 Yet the C standard says that program behavior is undefined on overflow, and in a few cases C programs do not work on some modern implementations because their overflows do not wrap around as their authors expected. 然而，C标准表明程序行为在溢出时是不确定的，并且在少数情况下，C程序不适用于某些现代实现，因为它们的溢出不会像作者所期望的那样包围。

and in the sub-section Practical Advice for Signed Overflow Issues says: 在签名溢出问题的实用建议小节中说：

Ideally the safest approach is to avoid signed integer overflow entirely.[...] 理想情况下，最安全的方法是完全避免有符号整数溢出。[...]

At the end of the day it is undefined behavior and therefore unpredictable in the general case but in the case of gcc , in their implementation defined section on Integer says that integer overflow wraps around: 在一天结束时它是未定义的行为，因此在一般情况下是不可预测的，但在gcc的情况下，在它们的实现中定义的Integer部分说整数溢出包裹：

For conversion to a type of width N, the value is reduced modulo 2^N to be within range of the type; 为了转换为宽度N的类型，该值以2 ^ N的模数减少到该类型的范围内; no signal is raised. 没有信号被提出。

but in their advice about integer overflow they explain how optimization can cause problems with wraparound : 但是在他们关于整数溢出的建议中，他们解释了优化如何导致环绕问题：