简体   繁体   English

整数溢出是否会因内存损坏而导致未定义的行为?

[英]Does integer overflow cause undefined behavior because of memory corruption?

I recently read that signed integer overflow in C and C++ causes undefined behavior: 我最近读到C和C ++中的带符号整数溢出会导致未定义的行为:

If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined. 如果在评估表达式期间,结果未在数学上定义或未在其类型的可表示值范围内,则行为未定义。

I am currently trying to understand the reason of the undefined behavior here. 我目前正试图了解这里未定义行为的原因。 I thought undefined behavior occurs here because the integer starts manipulating the memory around itself when it gets too big to fit the underlying type. 我认为这里发生了未定义的行为,因为当整数变得太大而无法适应底层类型时,整数开始操纵自身周围的内存。

So I decided to write a little test program in Visual Studio 2015 to test that theory with the following code: 所以我决定在Visual Studio 2015中编写一个小测试程序,用以下代码测试该理论:

#include <stdio.h>
#include <limits.h>

struct TestStruct
{
    char pad1[50];
    int testVal;
    char pad2[50];
};

int main()
{
    TestStruct test;
    memset(&test, 0, sizeof(test));

    for (test.testVal = 0; ; test.testVal++)
    {
        if (test.testVal == INT_MAX)
            printf("Overflowing\r\n");
    }

    return 0;
}

I used a structure here to prevent any protective matters of Visual Studio in debugging mode like the temporary padding of stack variables and so on. 我在这里使用了一个结构来防止Visual Studio在调试模式下的任何保护问题,比如堆栈变量的临时填充等等。 The endless loop should cause several overflows of test.testVal , and it does indeed, though without any consequences other than the overflow itself. 无限循环应该导致test.testVal几次溢出,并且确实如此,除了溢出本身之外没有任何后果。

I took a look at the memory dump while running the overflow tests with the following result ( test.testVal had a memory address of 0x001CFAFC ): 我在运行溢出测试时查看了内存转储,结果如下( test.testVal的内存地址为0x001CFAFC ):

0x001CFAE5  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x001CFAFC  94 53 ca d8 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

带内存转储的溢出整数

As you see, the memory around the int that is continuously overflowing remained "undamaged". 如你所见,int周围的内存不断溢出仍然“完好无损”。 I tested this several times with similar output. 我用类似的输出测试了几次。 Never was any memory around the overflowing int damaged. 从来没有任何关于溢出的int损坏的内存。

What happens here? 这里发生了什么? Why is there no damage done to the memory around the variable test.testVal ? 为什么变量test.testVal周围的内存没有受损? How can this cause undefined behavior? 这怎么会导致未定义的行为?

I am trying to understand my mistake and why there is no memory corruption done during an integer overflow. 我试图理解我的错误以及为什么在整数溢出期间没有内存损坏。

You misunderstand the reason for undefined behavior. 你误解了未定义行为的原因。 The reason is not memory corruption around the integer - it will always occupy the same size which integers occupy - but the underlying arithmetics. 原因不是整数周围的内存损坏 - 它总是占据整数占用的相同大小 - 而是基础算术。

Since signed integers are not required to be encoded in 2's complement, there can not be specific guidance on what is going to happen when they overflow. 由于有符号整数不需要以2的补码进行编码,因此无法具体指导它们溢出时会发生什么。 Different encoding or CPU behavior can cause different outcomes of overflow, including, for example, program kills due to traps. 不同的编码或CPU行为可能导致不同的溢出结果,包括例如由于陷阱导致的程序杀死。

And as with all undefined behavior, even if your hardware uses 2's complement for its arithmetic and has defined rules for overflow, compilers are not bound by them. 与所有未定义的行为一样,即使您的硬件对其算法使用2的补码并定义了溢出规则,编译器也不受它们的约束。 For example, for a long time GCC optimized away any checks which would only come true in a 2's-complement environment. 例如,很长一段时间,GCC优化了任何只能在二进制补码环境中实现的检查。 For instance, if (x > x + 1) f() is going to be removed from optimized code, as signed overflow is undefined behavior, meaning it never happens (from compiler's view, programs never contain code producing undefined behavior), meaning x can never be greater than x + 1 . 例如, if (x > x + 1) f()要从优化代码中删除if (x > x + 1) f() ,因为有符号溢出是未定义的行为,这意味着它永远不会发生(从编译器的视图来看,程序永远不会包含产生未定义行为的代码),意思是x永远不会大于x + 1

The authors of the Standard left integer overflow undefined because some hardware platforms might trap in ways whose consequences could be unpredictable (possibly including random code execution and consequent memory corruption). 标准的作者保留了未定义的整数溢出,因为某些硬件平台可能会陷入其后果可能无法预测的方式(可能包括随机代码执行和随之而来的内存损坏)。 Although two's-complement hardware with predictable silent-wraparound overflow handling was pretty much established as a standard by the time the C89 Standard was published (of the many reprogrammable-microcomputer architectures I've examined, zero use anything else) the authors of the Standard didn't want to prevent anyone from producing C implementations on older machines. 尽管具有可预测的静音环绕溢出处理的二进制补充硬件在C89标准发布时已经成为标准(我已经检查了许多可重编程微机架构,零使用其他任何东西)标准的作者不希望阻止任何人在旧机器上生成C实现。

On implementations which implemented commonplace two's-complement silent-wraparound semantics, code like 在实现普通的二进制补充静默环绕语义的实现上,代码就像

int test(int x)
{
  int temp = (x==INT_MAX);
  if (x+1 <= 23) temp+=2;
  return temp;
}

would, 100% reliably, return 3 when passed a value of INT_MAX, since adding 1 to INT_MAX would yield INT_MIN, which is of course less than 23. 当100%可靠时,在传递INT_MAX值时返回3,因为向INT_MAX添加1将产生INT_MIN,当然小于23。

In the 1990s, compilers used the fact that integer overflow was undefined behavior, rather than being defined as two's-complement wrapping, to enable various optimizations which meant that the exact results of computations that overflowed would not be predictable, but aspects of behavior that didn't depend upon the exact results would stay on the rails. 在20世纪90年代,编译器使用了这样的事实:整数溢出是未定义的行为,而不是被定义为二进制补码包装,以实现各种优化,这意味着溢出的计算的确切结果将是不可预测的,但是行为方面没有取决于确切的结果将留在轨道上。 A 1990s compiler given the above code might likely treat it as though adding 1 to INT_MAX yielded a value numerically one larger than INT_MAX, thus causing the function to return 1 rather than 3, or it might behave like the older compilers, yielding 3. Note that in the above code, such treatment could save an instruction on many platforms, since (x+1 <= 23) would be equivalent to (x <= 22). 给定上述代码的20世纪90年代的编译器可能会对它进行处理,好像在INT_MAX中添加1会产生一个大于INT_MAX的值,从而导致函数返回1而不是3,或者它可能像旧的编译器一样,产生3。在上面的代码中,这样的处理可以在许多平台上保存指令,因为(x + 1 <= 23)将等于(x <= 22)。 A compiler may not be consistent in its choice of 1 or 3, but the generated code would not do anything other than yield one of those values. 编译器在选择1或3时可能不一致,但生成的代码除了产生其中一个值之外不会执行任何操作。

Since then, however, it has become more fashionable for compilers to use the Standard's failure to impose any requirements on program behavior in case of integer overflow (a failure motivated by the existence of hardware where the consequences might be genuinely unpredictable) to justify having compilers launch code completely off the rails in case of overflow. 然而,从那以后,对于编译器而言,使用标准的失败对于程序行为的任何要求变得更加时髦,以防整数溢出(由硬件的存在导致的失败,其后果可能是真正不可预测的)以证明编译器的合理性。在溢出的情况下完全脱离轨道启动代码。 A modern compiler could notice that the program will invoke Undefined Behavior if x==INT_MAX, and thus conclude that the function will never be passed that value. 现代编译器可能会注意到,如果x == INT_MAX,程序将调用未定义的行为,从而得出该函数永远不会传递该值的结论。 If the function is never passed that value, the comparison with INT_MAX can be omitted. 如果函数永远不会传递该值,则可以省略与INT_MAX的比较。 If the above function were called from another translation unit with x==INT_MAX, it might thus return 0 or 2; 如果从x == INT_MAX的另一个翻译单元调用上述函数,则它可能因此返回0或2; if called from within the same translation unit, the effect might be even more bizarre since a compiler would extend its inferences about x back to the caller. 如果从同一个翻译单元中调用,效果可能会更奇怪,因为编译器会将其关于x的推断扩展回调用者。

With regard to whether overflow would cause memory corruption, on some old hardware it might have. 关于溢出是否会导致内存损坏,在某些旧硬件上可能会有。 On older compilers running on modern hardware, it won't. 在现代硬件上运行的旧编译器上,它不会。 On hyper-modern compilers, overflow negates the fabric of time and causality, so all bets are off. 在超现代编译器中,溢出否定了时间和因果关系的结构,所以所有的赌注都是关闭的。 The overflow in the evaluation of x+1 could effectively corrupt the value of x that had been seen by the earlier comparison against INT_MAX, making it behave as though the value of x in memory had been corrupted. x + 1评估中的溢出可以有效地破坏先前与INT_MAX的比较所看到的x的值,使其表现得好像内存中的x值已被破坏。 Further, such compiler behavior will often remove conditional logic that would have prevented other kinds of memory corruption, thus allowing arbitrary memory corruption to occur. 此外,这种编译器行为通常会删除会阻止其他类型的内存损坏的条件逻辑,从而允许发生任意内存损坏。

Undefined behaviour is undefined. 未定义的行为未定义。 It may crash your program. 它可能会使您的程序崩溃。 It may do nothing at all. 它可能什么都不做。 It may do exactly what you expected. 它可能完全符合您的预期。 It may summon nasal demons. 它可能会召唤鼻子恶魔。 It may delete all your files. 它可能会删除您的所有文件。 The compiler is free to emit whatever code it pleases (or none at all) when it encounters undefined behaviour. 当遇到未定义的行为时,编译器可以自由地发出它喜欢的任何代码(或者根本不发出代码)。

Any instance of undefined behaviour causes the entire program to be undefined - not just the operation that is undefined, so the compiler may do whatever it wants to any part of your program. 任何未定义行为的实例都会导致整个程序未定义 - 不仅仅是未定义的操作,因此编译器可以对程序的任何部分执行任何操作。 Including time travel: Undefined behavior can result in time travel (among other things, but time travel is the funkiest) . 包括时间旅行: 未定义的行为可能导致时间旅行(除其他外,但时间旅行是最有趣的)

There are many answers and blog posts about undefined behaviour, but the following are my favorites. 有许多关于未定义行为的答案和博客文章,但以下是我的最爱。 I suggest reading them if you want to learn more about the topic. 如果您想了解有关该主题的更多信息,我建议您阅读它们。

In addition to the esoteric optimization consequences, you've got to consider other issues even with the code you naively expect a non-optimizing compiler to generate. 除了深奥的优化结果之外,您还必须考虑其他问题,即使您天真地希望生成非优化编译器的代码也是如此。

  • Even if you know the architecture to be twos complement (or whatever), an overflowed operation might not set flags as expected, so a statement like if(a + b < 0) might take the wrong branch: given two large positive numbers, so when added together it overflows and the result, so the twos-complement purists claim, is negative, but the addition instruction may not actually set the negative flag) 即使您知道该体系结构是二进制补码(或其他),溢出操作可能不会按预期设置标志,因此像if(a + b < 0)这样的语句可能会采用错误的分支:给定两个大的正数,所以当加在一起时它溢出并且结果,所以二元补充纯粹主义者声称,是否定的,但加法指令可能实际上没有设置负标志)

  • A multi-step operation may have taken place in a wider register than sizeof(int), without being truncated at each step, and so an expression like (x << 5) >> 5 may not cut off the left five bits as you assume they would. 多步操作可能发生在比sizeof(int)更宽的寄存器中,而不是在每一步都被截断,因此像(x << 5) >> 5这样的表达式可能不会像你一样切断左边的五位假设他们愿意。

  • Multiply and divide operations may use a secondary register for extra bits in the product and dividend. 乘法和除法运算可以使用辅助寄存器来获得产品和股息中的额外位。 If multiply "can't" overflow, the compiler is free to assume that the secondary register is zero (or -1 for negative products) and not reset it before dividing. 如果乘法“不能”溢出,编译器可以自由地假设辅助寄存器为零(或负产品为-1)并且在分割之前不重置它。 So an expression like x * y / z may use a wider intermediate product than expected. 因此像x * y / z这样的表达式可能会使用比预期更广泛的中间产品。

Some of these sound like extra accuracy, but it's extra accuracy that isn't expected, can't be predicted nor relied upon, and violates your mental model of "each operation accepts N-bit twos-complement operands and returns the least significant N bits of the result for the next operation" 其中一些听起来像额外的准确性,但它的超精确性是预期的,无法预测或依赖,并且违反了你的心理模型“每个操作接受N位二进制补码操作数并返回最不重要的N下一次操作的结果位“

Integer overflow behaviour is not defined by the C++ standard. C ++标准未定义整数溢出行为。 This means that any implementation of C++ is free to do whatever it likes. 这意味着C ++的任何实现都可以随意做任何事情。

In practice this means: whatever is most convenient for the implementor. 在实践中,这意味着:对于实现者来说最方便的是什么。 And since most implementors treat int as a twos-complement value, the most common implementation nowadays is to say that an overflowed sum of two positive numbers is a negative number which bears some relation to the true result. 由于大多数实现者将int视为二进制补码值,因此现在最常见的实现是两个正数的溢出和是一个负数,它与真实结果有一定关系。 This is a wrong answer and it is allowed by the standard, because the standard allows anything. 这是一个错误的答案 ,标准允许这样做,因为标准允许任何内容。

There is an argument to say that integer overflow ought to be treated as an error , just like integer division by zero. 有一种说法认为整数溢出应该被视为一个错误 ,就像整数除零一样。 The '86 architecture even has the INTO instruction to raise an exception on overflow. '86架构甚至有INTO指令在溢出时引发异常。 At some point that argument may gain enough weight to make it into mainstream compilers, at which point an integer overflow may cause a crash. 在某些时候,该论点可能会获得足够的权重,使其成为主流编译器,此时整数溢出可能会导致崩溃。 This also conforms with the C++ standard, which allows an implementation to do anything. 这也符合C ++标准,它允许实现做任何事情。

You could imagine an architecture in which numbers were represented as null-terminated strings in little-endian fashion, with a zero byte saying "end of number". 您可以想象一种架构,其中数字以小端方式表示为以空字符结尾的字符串,零字节表示“数字结束”。 Addition could be done by adding byte by byte until a zero byte was reached. 可以通过逐字节添加来完成添加,直到达到零字节。 In such an architecture an integer overflow might overwrite a trailing zero with a one, making the result look far, far longer and potentially corrupting data in future. 在这样的体系结构中,整数溢出可能会用一个覆盖尾随零,从而使得结果看起来更远,更长并且可能在将来破坏数据。 This also conforms with the C++ standard. 这也符合C ++标准。

Finally, as pointed out in some other replies, a great deal of code generation and optimization depends on the compiler reasoning about the code it generates and how it would execute. 最后,正如其他一些回复中所指出的,大量的代码生成和优化取决于编译器对其生成的代码及其执行方式的推理。 In the case of an integer overflow, it is entirely licit for the compiler (a) to generate code for addition which gives negative results when adding large positive numbers and (b) to inform its code generation with the knowledge that addition of large positive numbers gives a positive result. 在整数溢出的情况下,编译器完全合法(a)生成用于添加的代码,其在添加大的正数时给出负结果;以及(b)通过添加大的正数来知道其代码生成给出了积极的结果。 Thus for example 因此,例如

if (a+b>0) x=a+b;

might, if the compiler knows that both a and b are positive, not bother to perform a test, but unconditionally to add a to b and put the result into x . 可能,如果编译器知道ab都是正数,没有费心去执行测试,但无条件地将a添加到b并将结果放入x On a twos-complement machine, that could lead to a negative value being put into x , in apparent violation of the intent of the code. 在二进制补码机器上,这可能导致将负值放入x ,这明显违反了代码的意图。 This would be entirely in conformity with the standard. 这完全符合标准。

It is undefined what value is represented by the int . 未定义int表示的值。 There's no 'overflow' in memory like you thought. 就像你想的那样,记忆中没有“溢出”。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 这会导致不确定的行为吗? - Does this cause undefined behavior? 在循环中的什么时候整数溢出变成未定义的行为? - At what point in the loop does integer overflow become undefined behavior? 为什么此显式析构函数会导致共享ptr中的内存损坏? - Why does this explicit destructor cause memory corruption in a shared ptr? 在C ++中,返回从本地char数组创建的字符串会导致内存泄漏还是未定义的行为? - In C++ does returning a string created from a local char array cause a memory leak or undefined behavior? 是否舍弃* this的const会导致不确定的行为? - Does cast away const of *this cause undefined behavior? C++ 这个 function 会导致未定义的行为吗? - C++ does this function cause a undefined behavior? “返回值优化”是否会导致未定义的行为? - Does “Return value optimization” cause undefined behavior? 这种用法中的指针算术是否会导致未定义的行为 - Does the pointer arithmetic in this usage cause undefined behavior 为什么 integer 溢出未定义行为仅适用于有符号整数,而不适用于无符号整数? - Why is integer overflow undefined behavior only for signed integers, and not for unsigned integers? 在C ++中,有符号整数溢出仍然是未定义的行为吗? - Is signed integer overflow still undefined behavior in C++?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM