简体   繁体   中英

Does the pointer arithmetic in this usage cause undefined behavior

This is a follow up to the following question . I was under the assumption, that the pointer arithmetic I originally used would cause undefined behavior. However I was told by a colleague, that the usage is actually well defined. The following is a simplified example:

typedef struct StructA {
    int a;
} StructA ;

typedef struct StructB {
    StructA a;
    StructA* b;
} StructB;

int main() {
    StructB* original = (StructB*)malloc(sizeof(StructB));
    original->a.a = 5;
    original->b = &original->a;

    StructB* copy = (StructB*)malloc(sizeof(StructB));
    memcpy(copy, original, sizeof(StructB));
    free(original);
    ptrdiff_t offset = (char*)copy - (char*)original;
    StructA* a = (StructA*)((char*)(copy->b) + offset);
    printf("%i\n", a->a);
    free(copy)
}

According to §5.7 ¶5 of the C++11 spec:

If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

I assumed, that the following part of the code:

ptrdiff_t offset = (char*)copy - (char*)original;
StructA* a = (StructA*)((char*)(copy->b) + offset);

causes undefined behavior, since it:

  1. subtracts two pointers, which point to different arrays
  2. the resulting pointer of the offset calculation does not point into the same array anymore.

Does this cause undefined behavior, or do I misinterpret the C++ specification? Does the same apply in C as well?

Edit:

Following the comments I assume the following modification would still be undefined behavior because of the object usage after the lifetime has ended:

ptrdiff_t offset = (char*)(copy->b) - (char*)original;
StructA* a = (StructA*)((char*)copy + offset);

Would it be defined when working with indexes instead:

typedef struct StructB {
    StructA a;
    ptrdiff_t b_offset;
} StructB;

int main() {
    StructB* original = (StructB*)malloc(sizeof(StructB));
    original->a.a = 5;
    original->b_offset = (char*)&(original->a) -  (char*)original

    StructB* copy = (StructB*)malloc(sizeof(StructB));
    memcpy(copy, original, sizeof(StructB));
    free(original);
    StructA* a = (StructA*)((char*)copy + copy->b_offset);
    printf("%i\n", a->a);
    free(copy);
}

It is undefined behavior because there are severe restrictions on what can be done with pointer arithmetic. The edits that you have made and that were suggested do nothing to fix this.

Undefined Behavior in Addition

StructA* a = (StructA*)((char*)copy + offset);

First of all, this is undefined behavior due to the addition onto copy :

When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.

  • (4.1) If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value.
  • (4.2) Otherwise, if P points to an array element i of an array object x with n elements ( [dcl.array] ), the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) array element i+j of x if 0 ≤ i + j ≤ n and the expression P - J points to the (possibly-hypothetical) array element i − j of x if 0 ≤ i − j ≤ n.
  • (4.3) Otherwise, the behavior is undefined.

See https://eel.is/c++draft/expr.add#4

In short, performing pointer arithmetic on non-arrays and non-null-pointers is always undefined behavior. Even if copy or its members were arrays, adding onto a pointer so that it becomes:

  • two or more past the end of the array
  • at least one before the first element

is also undefined behavior.

Undefined Behavior in Subtraction

ptrdiff_t offset = (char*)original - (char*)(copy->b);

The subtraction of your two pointers is also undefined behavior:

When two pointer expressions P and Q are subtracted, the type of the result is an implementation-defined signed integral type; [...]

  • (5.1) If P and Q both evaluate to null pointer values, the result is 0.
  • (5.2) Otherwise, if P and Q point to, respectively, array elements i and j of the same array object x, the expression P - Q has the value i − j.
  • (5.3) Otherwise, the behavior is undefined.

See https://eel.is/c++draft/expr.add#5

So subtracting pointers from one another, when they are not both null or pointers to elements of the same array is undefined behavior.

Undefined Behavior in C

The C standard has similar restrictions:

(8) [...] If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression.

(The standard does not mention what happens for non-array pointer addition)

(9) When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object; [...]

See §6.5.6 Additive Operators in the C11 standard (n1570) .

Using Data Member Pointers Instead

A clean and type-safe solution in C++ would be to use data member pointers.

typedef struct StructB {
    StructA a;
    StructA StructB::*b_offset;
} StructB;

int main() {
    StructB* original = (StructB*) malloc(sizeof(StructB));
    original->a.a = 5;
    original->b_offset = &StructB::a;

    StructB* copy = (StructB*) malloc(sizeof(StructB));
    memcpy(copy, original, sizeof(StructB));
    free(original);
    printf("%i\n", (copy->*(copy->b_offset)).a);
    free(copy);
}

Notes

The standard citations are from a C++ draft. The C++11 which you have cited does not appear to have any looser restrictions on pointer arithmetic, it is just formatted differently. See C++11 standard (n3337) .

The Standard explicitly provides that in situations it characterizes as Undefined Behavior, implementations may behave "in a documented fashion characteristic of the environment". According to the Rationale, the intention of such characterization was, among other things, to identify avenues of "conforming language extension"; the question of when implementations support such "popular extensions" was a Quality of Implementation issue best left to the marketplace.

Many implementations intended and/or configured for low-level programming on commonplace platforms extend the language by specifying that the following equivalences hold, for any pointers p and q of type T* and integer expression i :

  • The bit patterns of p , (uintptr_t)p , and (intptr_t)p are identical.
  • p+i is equivalent to (T*)((uintptr_t)p + (uintptr_t)i * sizeof (T))
  • pi is equivalent to (T*)((uintptr_t)p - (uintptr_t)i * sizeof (T))
  • pq is equivalent to ((uintptr_t)p - (uintptr_t)q) / sizeof (T) in all cases where the division would have no remainder.
  • p>q is equivalent to (uintptr_t)p > (uintptr_t)q and likewise for all other relational and comparison operators.

The Standard does not recognize any category of implementations that always uphold those equivalences, as distinct from those that do not, in part because they did not wish to portray as "inferior" implementations for unusual platforms where such upholding equivalence would be impractical. Instead, it expected that such implementations would be upheld on implementations where that would make sense, and programmers would know when they were targeting such implementations. Someone writing memory-management code for the 68000, or for small-model 8086 (where such equivalences would naturally hold) could write memory management code that would run interchangeably on other systems where those equivalences would hold, but someone writing memory-management code for large-model 8086 would need to design it explicitly for that platform because those equivalences do not hold (pointers are 32 bits, but individual objects are limited to 65520 bytes and most pointer operations only act upon the bottom 16 bits of a pointer).

Unfortunately, even on platforms where such equivalences would normally hold, some kinds of optimizations may yield corner-case behaviors that differ from those otherwise implied by those equivalences. Commercial compilers generally uphold the Spirit of C principle "don't prevent the programmer from doing what needs to be done", and can be configured to uphold the equivalences even when most optimizations are enabled. The gcc and clang C compilers, however, don't allow such control over semantics. When all optimizations are disabled, they will uphold those equivalences on commonplace platforms, but there is no other optimization setting that will prevent them from making inferences that would be inconsistent with them.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM