简体   繁体   中英

casting pointer to array into pointer

Consider the following C code:

int arr[2] = {0, 0};
int *ptr = (int*)&arr;
ptr[0] = 5;
printf("%d\n", arr[0]);

Now, it is clear that the code prints 5 on common compilers. However, can somebody find the relevant sections in the C standard that specifies that the code does in fact work? Or is the code undefined behaviour?

What I'm essentially asking is why &arr when casted into void * is the same as arr when casted into void * ? Because I believe the code is equivalent to:

int arr[2] = {0, 0};
int *ptr = (int*)(void*)&arr;
ptr[0] = 5;
printf("%d\n", arr[0]);

I invented the example while thinking about the question here: Pointer-to-array overlapping end of array ...but this is clearly a distinct question.

For unions and structures, cf. ISO 9899:2011§6.7.2.1/16f:

16 The size of a union is sufficient to contain the largest of its members. The value of at most one of the members can be stored in a union object at any time. A pointer to a union object, suitably converted, points to each of its members (or if a member is a bit-field, then to the unit in which it resides), and vice versa.

17 Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared. A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning.

For array types, the situation is a bit more complex. First, observe what an array is, from ISO 9899:2011§6.2.5/20:

An array type describes a contiguously allocated nonempty set of objects with a particular member object type, called the element type . The element type shall be complete whenever the array type is specified. Array types are characterized by their element type and by the number of elements in the array. An array type is said to be derived from its element type, and if its element type is T , the array type is sometimes called “array of T ”. The construction of an array type from an element type is called “array type derivation”.

The wording “contiguously allocated” implies that there is no padding between array members. This notion is affirmed by footnote 109:

Two objects may be adjacent in memory because they are adjacent elements of a larger array or adjacent members of a structure with no padding between them, or because the implementation chose to place them so, even though they are unrelated. If prior invalid pointer operations (such as accesses outside array bounds) produced undefined behavior, subsequent comparisons also produce undefined behavior.

The use of the sizeof operator in §6.5.3.5, Example 2 expresses the intent that there is also no padding before or after arrays:

EXAMPLE 2

Another use of the sizeof operator is to compute the number of elements in an array:

 sizeof array / sizeof array[0] 

I therefore conclude that a pointer to an array, converted to a pointer to the element typo of that array, points to the first element in the array. Furthermore, observe what the definition of equality says about pointers (§6.5.9/6f.):

6 Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space. 109)

7 For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.

Since the first element of an array is “a subobject at its beginning,” a pointer to the first element of an array and a pointer to an array compare equal.

Here is a slightly refactored version of your code for easier reference:

int arr[2] = { 0, 0 };
int *p1 = &arr[0];
int *p2 = (int *)&arr;

with the question being: Is p1 == p2 true, or unspecified, or UB?


Firstly: I think that it is intended by the authors of C's abstract memory model that p1 == p2 is true; and if the Standard doesn't actually spell it out then it would be a defect in the Standard.

Moving on; the only relevant piece of text seems to be C11 6.3.2.3/7 (irrelevant text excised):

A pointer to an object type may be converted to a pointer to a different object type. [...] When converted back again, the result shall compare equal to the original pointer.

When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.

It doesn't specifically say what the result of the first conversion is. Ideally it should say ...and the pointer points to the same address , but it doesn't.

However, I argue that it it is implied that the pointer must point to the same address after the conversion. Here is an illustrative example:

void *v1 = malloc( sizeof(int) );
int  *i1 = (int *)v1;

If we do not accept " and the pointer points to the same address " then i1 might not actually point into the malloc 'd space, which would be ridiculous.

My conclusion is that we should read 6.3.2.3/7 as saying that the pointer cast does not change the address being pointed to. The part about using pointers to character type seems to back this up.

Therefore, since p1 and p2 have the same type and point to the same address, they compare equal.

To answer directly:

  • 6.3.2.1 Lvalues, arrays, and function designators, paragraph 1
  • 6.3.2.3 Pointers, paragraphs 1,5 and 6
  • 6.5.3.2 Address and indirection operators, paragraph 3

The code posted is not undefined, but it "might" be compiler/implementation specific (per section 6.3.2.3 p5/6) 发布的代码未定义,但它“可能”是编译器/实现特定的(根据第6.3.2.3节第5/6页)

This would imply asking why int *ptr = (int*)(void*)&arr gives the same results as int *ptr = (int*)(void*)arr; , but per your code posted, you're actually asking why int *ptr = (int*)(void*)&arr gives the same as int *ptr = (int*)&arr .

Either way I'll expand on what your code is actually doing to help clarify:

Per 6.3.2.1p3:

.

and per 6.5.3.2p3:

So in your first declaration

int arr[2] = {0, 0};

arr is initialized to an array type containing 2 elements of type int both equal to 0. Then per 6.3.2.1p3 it is "decayed" into a pointer type pointing to the first element anywhere it is called in scope ( ). )。

So in your next line, you could simply just do the following:

int *ptr = arr; or int *ptr = &*arr; or int *ptr = &arr[0];

and ptr is now a pointer to an int type that points to the first element of the array arr (ie &arr[0] ).

Instead you declare it as such:

int *ptr = (int*)&arr;

Lets break this down into it's parts:

  1. &arr -> triggers the exception to 6.3.2.1p3 so instead of getting &arr[0] , you get the address to arr which is an int(*)[2] type (not an int* type), so you are not getting a pointer to an int , you are getting a pointer to an int array

  2. (int*)&arr , (ie the cast to int* ) -> per 6.5.3.2p3, &arr takes the address of the variable arr returning a pointer to the type of it, so simply saying int* ptr = &arr will give a warning of (since ptr is of type int* and &arr is of type int(*)[2] ) which is why you need to cast to an int* . 警告(因为ptr的类型为int*&arr的类型为int(*)[2] ),这就是你需要转换为int*

Further per 6.3.2.3p1: .

So, you're declaration of int* ptr = (int*)(void*)&arr; would produce the same results as int* ptr = (int*)&arr; because of the types you are using and converting to/from. Also as a note: ptr[0] = 5; is the same as *ptr = 5 , where ptr[1] = 5; would also be the same as *++ptr = 5;

Some of the references:

1. An lvalue is an expression (with an object type other than void) that potentially designates an object (*see note); if an lvalue does not designate an object when it is evaluated, the behavior is undefined. When an object is said to have a particular type, the type is specified by the lvalue used to designate the object. A modifiable lvalue is an lvalue that does not have array type, does not have an incomplete type, does not have a constqualified type, and if it is a structure or union, does not have any member (including, recursively, any member or element of all contained aggregates or unions) with a constqualified type.

*The name ''lvalue'' comes originally from the assignment expression E1 = E2, in which the left operand E1 is required to be a (modifiable) lvalue. It is perhaps better considered as representing an object ''locator value''. What is sometimes called ''rvalue'' is in this International Standard described as the ''value of an expression''. An obvious example of an lvalue is an identifier of an object. As a further example, if E is a unary expression that is a pointer to an object, *E is an lvalue that designates the object to which E points.

2. Except when it is the operand of the sizeof operator, the _Alignof operator, the unary & operator, the ++ operator, the -- operator, or the left operand of the . operator or an assignment operator, an lvalue that does not have array type is converted to the value stored in the designated object (and is no longer an lvalue); this is called lvalue conversion. If the lvalue has qualified type, the value has the unqualified version of the type of the lvalue; additionally, if the lvalue has atomic type, the value has the non-atomic version of the type of the lvalue; otherwise, the value has the type of the lvalue. If the lvalue has an incomplete type and does not have array type, the behavior is undefined. If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.

3. Except when it is the operand of the sizeof operator, the _Alignof operator, or the unary & operator, or is a string literal used to initialize an array, an expression that has type ''array of type'' is converted to an expression with type ''pointer to type'' that points to the initial element of the array object and is not an lvalue. If the array object has register storage class, the behavior is undefined.

1. A pointer to void may be converted to or from a pointer to any object type. A pointer to any object type may be converted to a pointer to void and back again; the result shall compare equal to the original pointer.

5. An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation (the mapping functions for converting a pointer to an integer or an integer to a pointer are intended to be consistent with the addressing structure of the execution environment).

6. Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of any integer type.

1. The operand of the unary & operator shall be either a function designator, the result of a [] or unary * operator, or an lvalue that designates an object that is not a bit-field and is not declared with the register storage-class specifier.

3. The unary & operator yields the address of its operand. If the operand has type ''type'', the result has type ''pointer to type''. If the operand is the result of a unary * operator, neither that operator nor the & operator is evaluated and the result is as if both were omitted, except that the constraints on the operators still apply and the result is not an lvalue. Similarly, if the operand is the result of a [] operator, neither the & operator nor the unary * that is implied by the [] is evaluated and the result is as if the & operator were removed and the [] operator were changed to a + operator. Otherwise, the result is a pointer to the object or function designated by its operand.

4. The unary * operator denotes indirection. If the operand points to a function, the result is a function designator; if it points to an object, the result is an lvalue designating the object. If the operand has type ''pointer to type'', the result has type ''type''. If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined (*see note).

*Thus, &*E is equivalent to E (even if E is a null pointer), and &(E1[E2]) to ((E1)+(E2)). It is always true that if E is a function designator or an lvalue that is a valid operand of the unary & operator, *&E is a function designator or an lvalue equal to E. If *P is an lvalue and T is the name of an object pointer type, *(T)P is an lvalue that has a type compatible with that to which T points. Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer, an address inappropriately aligned for the type of object pointed to, and the address of an object after the end of its lifetime.

5. Preceding an expression by a parenthesized type name converts the value of the expression to the named type. This construction is called a cast (a cast does not yield an lvalue; thus, a cast to a qualified type has the same effect as a cast to the unqualified version of the type). A cast that specifies no conversion has no effect on the type or value of an expression.

6. If the value of the expression is represented with greater range or precision than required by the type named by the cast (6.3.1.8), then the cast specifies a conversion even if the type of the expression is the same as the named type and removes any extra range and precision.

2. In simple assignment (=), the value of the right operand is converted to the type of the assignment expression and replaces the value stored in the object designated by the left operand.

1. In addition to optional type qualifiers and the keyword static, the [ and ] may delimit an expression or *. If they delimit an expression (which specifies the size of an array), the expression shall have an integer type. If the expression is a constant expression, it shall have a value greater than zero. The element type shall not be an incomplete or function type. The optional type qualifiers and the keyword static shall appear only in a declaration of a function parameter with an array type, and then only in the outermost array type derivation.

3. If, in the declaration ''T D1'', D1 has one of the forms:

D[ type-qualifier-listopt assignment-expressionopt ]
D[ static type-qualifier-listopt assignment-expression ]
D[ type-qualifier-list static assignment-expression ]
D[ type-qualifier-listopt * ]


and the type specified for ident in the declaration ''T D'' is ''derived-declarator-type-list T'', then the type specified for ident is ''derived-declarator-type-list array of T''.142) (See 6.7.6.3 for the meaning of the optional type qualifiers and the keyword static.)

4. If the size is not present, the array type is an incomplete type. If the size is * instead of being an expression, the array type is a variable length array type of unspecified size, which can only be used in declarations or type names with function prototype scope;143) such arrays are nonetheless complete types. If the size is an integer constant expression and the element type has a known constant size, the array type is not a variable length array type; otherwise, the array type is a variable length array type. (Variable length arrays are a conditional feature that implementations need not support; see 6.10.8.3.)

5. If the size is an expression that is not an integer constant expression: if it occurs in a declaration at function prototype scope, it is treated as if it were replaced by *; otherwise, each time it is evaluated it shall have a value greater than zero. The size of each instance of a variable length array type does not change during its lifetime. Where a size expression is part of the operand of a sizeof operator and changing the value of the size expression would not affect the result of the operator, it is unspecified whether or not the size expression is evaluated.

6. For two array types to be compatible, both shall have compatible element types, and if both size specifiers are present, and are integer constant expressions, then both size specifiers shall have the same constant value. If the two array types are used in a context which requires them to be compatible, it is undefined behavior if the two size specifiers evaluate to unequal values.

PS As a side note, given the following code:

#include <stdio.h>

int main(int argc, char** argv)
{
    int arr[2] = {10, 20};
    X
    Y
    printf("%d,%d\n", arr[0],arr[1]);
    return 0;
}

where X was one of the following:

int *ptr = (int*)(void*)&arr;
int *ptr = (int*)&arr;
int *ptr = &arr[0];

and Y was one of the following:

ptr[0] = 15;
*ptr = 15;

When compiled on OpenBSD with gcc version 4.2.1 20070719 and providing the -S flag, the assembler output for all files was exactly the same.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM