简体   繁体   中英

Pointer arithmetic result in pointer to another struct member via previous member address (in the same struct)

What is the view from C standard about pointer arithmetic result in pointer to another struct member via previous member address in the same struct?


Code 1 (without struct), mystery_1

int mystery_1(void)
{
    int one = 1, two = 2;
    int *p1 = &one + 1;
    int *p2 = &two;
    unsigned long i1 = (unsigned long) p1;
    unsigned long i2 = (unsigned long) p2;

    if (i1 == i2)
        return p1 == p2;
    return 2;
}

From code 1, I know that the result is not determined, because there is no guarantee how local variables on the stack lay.

What if I use struct like this (code 2)?


Code 2 (with struct), mystery_2

int mystery_2(void)
{
    struct { int one, two; } my_var = {
        .one = 1, .two = 2
    };
    int *p1 = &my_var.one + 1;
    int *p2 = &my_var.two;
    unsigned long i1 = (unsigned long) p1;
    unsigned long i2 = (unsigned long) p2;
    
    if (i1 == i2)
        return p1 == p2;
    return 2;
}

Compilers Output

Godbolt link: https://godbolt.org/z/jGoKfETn7

GCC 10.2

mystery_1:
        xorl    %eax, %eax # return 0, while clang returns 2 (fine as no guarantee)
        ret
mystery_2:
        movl    $1, %eax # return 1, as compiler must consider the memory order of struct members
        ret

Clang 11.0.1

mystery_1:                              # @mystery_1
        movl    $2, %eax # return 2, while gcc returns 0 (fine as no guarantee)
        retq
mystery_2:                              # @mystery_2
        movl    $1, %eax # return 1, as compiler must consider the memory order of struct members
        retq

My understanding

  • In code 1, the return value is not determined, because there is no guarantee about memory layout of local variables on the stack.
  • In code 2, the return value is determined and well-defined as 1 as p1 == p2 yields true, because struct guarantees the memory layout. So next address of my_var.one is my_var.two , and compiler is not allowed to assume that p1 and p2 is different because of their provenance.

Questions

  • Is my understanding correct?
  • According to C standard, does mystery_2 always return 1 as p1 == p2 yields true?
  • In mystery_2 , is compiler allowed to assume that p1 != p2 , so the function returns 0?

The problem

I had a discussion with someone regarding the struct case ( mystery_2 ), they said that:

p1 points to (one past) one, and p2 points to two. Those are, in C spec, counted as different "objects". The spec then goes on to define that pointers to different objects might compare as different, even though both pointers have the exact same bit pattern

Is my understanding correct?

No.

You're correct about the local variables; but not for the struct example.

According to C standard, does mystery_2 always return 1 as p1 == p2 yields true?

No. That's not guaranteed by the C standard. Because there can be padding between one and two .

Practically, there's no reason for any compiler to insert padding between them in this example. And you can nearly always expect mystery_2 to return 1. But this is not required by the C standard and thus a pathological compiler could insert padding between one and two and that'd be perfectly valid.

With respect to padding: The only guarantee is that there can't be any padding before the first member of a struct. So a pointer to a struct and a pointer to its first member are guaranteed to be the same. No other guarantees whatsoever.

Note: you should be using uinptr_t for storing pointer values ( unsigned long isn't guaranteed to be able to hold a pointer value).

Two basics of pointer arithmetic are, per C 2018 6.5.6 8:

  • A pointer to an element of an array may be adjusted (by addition and subtraction of an integer) to point to any element of the array or to the end (one beyond the last element). Arithmetic outside that is not defined by the C standard.
  • For pointer arithmetic, a single object acts like an array of one object.

Therefore int *p1 = &one + 1; has defined behavior.

Regarding:

    unsigned long i1 = (unsigned long) p1;
    unsigned long i2 = (unsigned long) p2;

Since it is not the focus of this question, let's assume the implementation-defined conversion of a pointer to an unsigned long produces a unique value that uniquely identifies the pointer value. (That is, conversion of any address to an unsigned long only ever produces one value for that address, and conversion of the value back to a pointer reproduces the address. The C standard does not guarantee this.)

Then, if i1 == i2 , it implies p1 == p2 and vice-versa. Per C 2018 6.5.9 6, p1 and p2 can compare equal only if two (which p2 points to) has been laid out in memory one beyond one (which p1 points just beyond). (In general, pointers can compare equal for other reasons, but those cases involve pointers to the same object, a structure and its first member, the same function, and so on, all of which are ruled out for this particular p1 and p2 .)

So the code in Code 1 will return 1 if two is laid out in memory just after one and 2 otherwise.

The same is true in Code 2. The pointer arithmetic &my_var.one + 1 is defined, and the resulting p1 compares equal to p2 if and only if the member two immediately follows the member one in memory.

However, two does not have to immediately follow one . This statement is incorrect:

… struct guarantees the memory layout.

The C standard allows implementations to put padding between structure members. Common C implementations will not do this for struct { int one, two; } struct { int one, two; } because it is not needed for alignment (once one is aligned, the address immediately following it is also suitably aligned for int , so no padding is needed), but C standard does not guarantee it.

Notes

uintptr_t , declared in <stdint.h> , is a better choice for converting pointers to integers. However, the standard only guarantees that (uintptr_t) px == (uintptr_t) py implies px == py , not that px == py implies (uintptr_t) px == (uintptr_t) py . In other words, converting two pointers to the same object to uintptr_t might produce two different values, although converting them back to pointers will result in pointers that compare as equal.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM