简体   繁体   中英

Does integer overflow cause undefined behavior because of memory corruption?

I recently read that signed integer overflow in C and C++ causes undefined behavior:

If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined.

I am currently trying to understand the reason of the undefined behavior here. I thought undefined behavior occurs here because the integer starts manipulating the memory around itself when it gets too big to fit the underlying type.

So I decided to write a little test program in Visual Studio 2015 to test that theory with the following code:

#include <stdio.h>
#include <limits.h>

struct TestStruct
{
    char pad1[50];
    int testVal;
    char pad2[50];
};

int main()
{
    TestStruct test;
    memset(&test, 0, sizeof(test));

    for (test.testVal = 0; ; test.testVal++)
    {
        if (test.testVal == INT_MAX)
            printf("Overflowing\r\n");
    }

    return 0;
}

I used a structure here to prevent any protective matters of Visual Studio in debugging mode like the temporary padding of stack variables and so on. The endless loop should cause several overflows of test.testVal , and it does indeed, though without any consequences other than the overflow itself.

I took a look at the memory dump while running the overflow tests with the following result ( test.testVal had a memory address of 0x001CFAFC ):

0x001CFAE5  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x001CFAFC  94 53 ca d8 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

带内存转储的溢出整数

As you see, the memory around the int that is continuously overflowing remained "undamaged". I tested this several times with similar output. Never was any memory around the overflowing int damaged.

What happens here? Why is there no damage done to the memory around the variable test.testVal ? How can this cause undefined behavior?

I am trying to understand my mistake and why there is no memory corruption done during an integer overflow.

You misunderstand the reason for undefined behavior. The reason is not memory corruption around the integer - it will always occupy the same size which integers occupy - but the underlying arithmetics.

Since signed integers are not required to be encoded in 2's complement, there can not be specific guidance on what is going to happen when they overflow. Different encoding or CPU behavior can cause different outcomes of overflow, including, for example, program kills due to traps.

And as with all undefined behavior, even if your hardware uses 2's complement for its arithmetic and has defined rules for overflow, compilers are not bound by them. For example, for a long time GCC optimized away any checks which would only come true in a 2's-complement environment. For instance, if (x > x + 1) f() is going to be removed from optimized code, as signed overflow is undefined behavior, meaning it never happens (from compiler's view, programs never contain code producing undefined behavior), meaning x can never be greater than x + 1 .

The authors of the Standard left integer overflow undefined because some hardware platforms might trap in ways whose consequences could be unpredictable (possibly including random code execution and consequent memory corruption). Although two's-complement hardware with predictable silent-wraparound overflow handling was pretty much established as a standard by the time the C89 Standard was published (of the many reprogrammable-microcomputer architectures I've examined, zero use anything else) the authors of the Standard didn't want to prevent anyone from producing C implementations on older machines.

On implementations which implemented commonplace two's-complement silent-wraparound semantics, code like

int test(int x)
{
  int temp = (x==INT_MAX);
  if (x+1 <= 23) temp+=2;
  return temp;
}

would, 100% reliably, return 3 when passed a value of INT_MAX, since adding 1 to INT_MAX would yield INT_MIN, which is of course less than 23.

In the 1990s, compilers used the fact that integer overflow was undefined behavior, rather than being defined as two's-complement wrapping, to enable various optimizations which meant that the exact results of computations that overflowed would not be predictable, but aspects of behavior that didn't depend upon the exact results would stay on the rails. A 1990s compiler given the above code might likely treat it as though adding 1 to INT_MAX yielded a value numerically one larger than INT_MAX, thus causing the function to return 1 rather than 3, or it might behave like the older compilers, yielding 3. Note that in the above code, such treatment could save an instruction on many platforms, since (x+1 <= 23) would be equivalent to (x <= 22). A compiler may not be consistent in its choice of 1 or 3, but the generated code would not do anything other than yield one of those values.

Since then, however, it has become more fashionable for compilers to use the Standard's failure to impose any requirements on program behavior in case of integer overflow (a failure motivated by the existence of hardware where the consequences might be genuinely unpredictable) to justify having compilers launch code completely off the rails in case of overflow. A modern compiler could notice that the program will invoke Undefined Behavior if x==INT_MAX, and thus conclude that the function will never be passed that value. If the function is never passed that value, the comparison with INT_MAX can be omitted. If the above function were called from another translation unit with x==INT_MAX, it might thus return 0 or 2; if called from within the same translation unit, the effect might be even more bizarre since a compiler would extend its inferences about x back to the caller.

With regard to whether overflow would cause memory corruption, on some old hardware it might have. On older compilers running on modern hardware, it won't. On hyper-modern compilers, overflow negates the fabric of time and causality, so all bets are off. The overflow in the evaluation of x+1 could effectively corrupt the value of x that had been seen by the earlier comparison against INT_MAX, making it behave as though the value of x in memory had been corrupted. Further, such compiler behavior will often remove conditional logic that would have prevented other kinds of memory corruption, thus allowing arbitrary memory corruption to occur.

Undefined behaviour is undefined. It may crash your program. It may do nothing at all. It may do exactly what you expected. It may summon nasal demons. It may delete all your files. The compiler is free to emit whatever code it pleases (or none at all) when it encounters undefined behaviour.

Any instance of undefined behaviour causes the entire program to be undefined - not just the operation that is undefined, so the compiler may do whatever it wants to any part of your program. Including time travel: Undefined behavior can result in time travel (among other things, but time travel is the funkiest) .

There are many answers and blog posts about undefined behaviour, but the following are my favorites. I suggest reading them if you want to learn more about the topic.

In addition to the esoteric optimization consequences, you've got to consider other issues even with the code you naively expect a non-optimizing compiler to generate.

  • Even if you know the architecture to be twos complement (or whatever), an overflowed operation might not set flags as expected, so a statement like if(a + b < 0) might take the wrong branch: given two large positive numbers, so when added together it overflows and the result, so the twos-complement purists claim, is negative, but the addition instruction may not actually set the negative flag)

  • A multi-step operation may have taken place in a wider register than sizeof(int), without being truncated at each step, and so an expression like (x << 5) >> 5 may not cut off the left five bits as you assume they would.

  • Multiply and divide operations may use a secondary register for extra bits in the product and dividend. If multiply "can't" overflow, the compiler is free to assume that the secondary register is zero (or -1 for negative products) and not reset it before dividing. So an expression like x * y / z may use a wider intermediate product than expected.

Some of these sound like extra accuracy, but it's extra accuracy that isn't expected, can't be predicted nor relied upon, and violates your mental model of "each operation accepts N-bit twos-complement operands and returns the least significant N bits of the result for the next operation"

Integer overflow behaviour is not defined by the C++ standard. This means that any implementation of C++ is free to do whatever it likes.

In practice this means: whatever is most convenient for the implementor. And since most implementors treat int as a twos-complement value, the most common implementation nowadays is to say that an overflowed sum of two positive numbers is a negative number which bears some relation to the true result. This is a wrong answer and it is allowed by the standard, because the standard allows anything.

There is an argument to say that integer overflow ought to be treated as an error , just like integer division by zero. The '86 architecture even has the INTO instruction to raise an exception on overflow. At some point that argument may gain enough weight to make it into mainstream compilers, at which point an integer overflow may cause a crash. This also conforms with the C++ standard, which allows an implementation to do anything.

You could imagine an architecture in which numbers were represented as null-terminated strings in little-endian fashion, with a zero byte saying "end of number". Addition could be done by adding byte by byte until a zero byte was reached. In such an architecture an integer overflow might overwrite a trailing zero with a one, making the result look far, far longer and potentially corrupting data in future. This also conforms with the C++ standard.

Finally, as pointed out in some other replies, a great deal of code generation and optimization depends on the compiler reasoning about the code it generates and how it would execute. In the case of an integer overflow, it is entirely licit for the compiler (a) to generate code for addition which gives negative results when adding large positive numbers and (b) to inform its code generation with the knowledge that addition of large positive numbers gives a positive result. Thus for example

if (a+b>0) x=a+b;

might, if the compiler knows that both a and b are positive, not bother to perform a test, but unconditionally to add a to b and put the result into x . On a twos-complement machine, that could lead to a negative value being put into x , in apparent violation of the intent of the code. This would be entirely in conformity with the standard.

It is undefined what value is represented by the int . There's no 'overflow' in memory like you thought.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM