简体   繁体   中英

Is it undefined behavior to access an array out of bounds if I know what data is at the accessed adress?

Imagine the following definition.

struct X {
    double a[8] {0.0};
    double b[8] {0.0};
}

int main() {
    X x;
    x.a[10] = 1.0;
}

Is the behavior of the program undefined when I access xa[10] ?

Yes, it is undefined behavior, but not only because the compiler may alter the memory layout of X .

It is undefined behavior because the standard says so. As a result the compiler can do whatever with this code: the compiler can drop the assignment completely, can assign 1.0 to all the 16 elements, can change what previous code is doing, can crash the program, format your hard drive, etc.


A more realistic, classical example: the following function

const int table[4] = {2, 4, 6, 8};

bool exists_in_table(int v)
{
    for (int i = 0; i <= 4; i++) {
        if (table[i] == v) return true;
    }
    return false;
}

always return true , at least in modern gcc with -O3 ( https://godbolt.org/z/f9cbWMYzM )

Yes. It's undefined behavior. Because the compiler could insert some padding between a and b . Il could also insert some padding after b . The only thing you can be sure of is that no padding will be put before a .

An interesting and very detailed description is provided in The Lost Art of Structure Packing .

EDIT : Petr answer is more precise than mine: "It is undefined behavior because the standard says so." Anyway, sometimes you feel like "Well the standard says so, but it will always work in practice". So it may be useful to know what could happen in real case scenarios. It's unlikely (but possible) that an out of bound access will format your hard drive, and it's much more likely that two structure elements will not be contiguous in memory.

Your false premise is that out of bounds access would be defined, then of course it is... not undefined. But it is not. You cannot know what data is at the accessed adress. Already the notion of some data being accessed at some adress is not right for the expression xa[10] .

You assume that the two arrays are stored in adjacent memory directly next to each other and that pointer arithmetics works also beyond the bounds of an array. The first is not true in general and the second is false always.

Pointer arithmetics is only defined within the bounds of an array.

Your code has undefined behavior.


Note that your code is not instructions for your CPU. Your code is an abstract description of what the program should do. And xa[10] simply has no meaning in the language. It is not defined. The compiler is not mandated to translate it to anything useful.

It is according to the "undefined behavior sanitizer" built into g++/clang++

runtime error: index 10 out of bounds for type 'double [8]'

ie the out of bounds access you're already worried about, is indeed UB. It's very very risky doing assumptions on memory layout and usually (even when you know what you're doing) the resulting code is not portable.

Demo

Notice in the demo, invocation of the checker happened with -fsanitize=undefined . We have great tools, let's use them.

Also notice that if you enable the memory checker, ie -fsanitize=address , no warning is emitted. That is to say, "just because it doesn't cause runtime memory problems it doesn't mean it's not UB".

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM