简体   繁体   中英

Why is the printf statement in the code below printing a value rather than a garbage value?

int main(){
    int array[] = [10,20,30,40,50] ;
    printf("%d\n",-2[array -2]);
    return 0 ;
}

Can anyone explain how -2[array-2] is working and Why are [ ] used here? This was a question in my assignment it gives the output " -10 " but I don't understand why?

Technically speaking, this invokes undefined behaviour. Quoting C11 , chapter §6.5.6

If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. [....]

So, (array-2) is undefined behavior.

However, most compilers will read the indexing, and it will likely be able to nullify the +2 and -2 indexing, [ 2[a] is same as a[2] which is same as *(a+2) , thus, 2[a-2] is *((2)+(a-2)) ], and only consider the remaining expression to be evaluated, which is *(a) or, a[0] .

Then, check the operator precedence

-2[array -2] is effectively the same as -(array[0]) . So, the result is the value array[0] , and - ved.

This is an unfortunate example for instruction, because it implies it's okay to do some incorrect things that often work in practice.

The technically correct answer is that the program has Undefined Behavior, so any result is possible, including printing -10, printing a different number, printing something different or nothing at all, failing to run, crashing, and/or doing something entirely unrelated.

The undefined behavior comes up from evaluating the subexpression array -2 . array decays from its array type to a pointer to the first element. array -2 would point at the element which comes two positions before that, but there is no such element (and it's not the "one-past-the-end" special rule), so evaluating that is a problem no matter what context it appears in.

(C11 6.5.6/8 says)

When an expression that has integer type is added to or subtracted from a pointer, .... If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.


Now the technically incorrect answer the instructor is probably looking for is what actually happens on most implementations:

Even though array -2 is outside the actual array, it evaluates to some address which is 2*sizeof(int) bytes before the address where the array's data starts. It's invalid to dereference that address since we don't know that there actually is any int there, but we're not going to.

Looking at the larger expression -2[array -2] , the [] operator has higher precedence than the unary - operator, so it means -(2[array -2]) and not (-2)[array -2] . A[B] is defined to mean the same as *((A)+(B)) . It's customary to have A be a pointer value and B be an integer value, but it's also legal to use them reversed like we're doing here. So these are equivalent:

-2[array -2]
-(2[array -2])
-(*(2 + (array - 2)))
-(*(array))

The last step acts like we would expect: Adding two to the address value of array - 2 is 2*sizeof(int) bytes after that value, which gets us back to the address of the first array element. So *(array) dereferences that address, giving 10, and -(*(array)) negates that value, giving -10. The program prints -10.


You should never count on things like this, even if you observe it "works" on your system and compiler. Since the language guarantees nothing about what will happen, the code might not work if you make slight changes which seem they shouldn't be related, or on a different system, a different compiler, a different version of the same compiler, or using the same system and compiler on a different day.

Here is how -2[array-2] is evaluated:

First, note that -2[array-2] is parsed as - (2[array-2]) . The subscript operator, [...] has higher precedence than the unary - operator. We often think of constants like -2 as single numbers, but it is in fact a - operator applied to a 2 .

In array-2 , array is automatically converted to a pointer to its first element, so it points to array[0] .

Then array-2 attempts to calculate a pointer to two elements before the first element of the array. The resulting behavior is not defined by the C standard because C 2018 6.5.6 8 says that only arithmetic that points to array members and the end of the array is defined.

For illustration only, suppose we are using a C implementation that extends the C standard by defining pointers to use a flat address space and permit arbitrary pointer arithmetic. Then array-2 points two elements before the array.

Then 2[array-2] uses the fact that the C standard defines E1[E2] to be *((E1)+(E2)) . That is, the subscript operator is implemented by adding the two things and applying * . Thus, it does not matter which expression is E1 and which is E2 . E1+E2 is the same as E2+E1 . So 2[array-2] is *(2 + (array-2)) . Adding 2 moves the pointer from two elements before the array back to the start of the array. Then applying * produces the element at that location, which is 10.

Finally, applying - gives −10. (Recall that this conclusion is only achieved using our supposition that the C implementation supports a flat address space. You cannot use this in general C code.)

This code invokes undefined behavior and can print anything, including -10 .

C17 6.5.2.1 Array subscripting states:

The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2)))

Meaning array[n] is equivalent to *((array) + (n)) and that's how the compiler evaluates subscripting. This allows us to write silly obfuscation like n[array] as 100% equivalent to array[n] . Because *((n) + (array)) is equivalent to *((array) + (n)) . As explained here:
With arrays, why is it the case that a[5] == 5[a]?

Looking at the expression -2[array -2] specifically:

  • [array -2] and [array - 2] are naturally equivalent. In this case the former is just sloppy style purposely used for the sake of obfuscating the code.
  • Operator precedence tells us to first consider [] .
  • Thus the expression is equivalent to -*( (2) + (array - 2) )
  • Note that the first - is not part of the integer constant 2 . C does not support negative integer constants 1) , the - is actually the unary minus operator.
  • Unary minus has lower presedence than [] , so the 2 in -2[ "binds" to the [ .
  • The sub-expression (array - 2) is evaluated individually and invokes undefined behavior, as per C17 6.5.6/8:

    When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. /--/ If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

  • Speculatively, one potential form of undefined behavior could be that a compiler decides to replace the whole expression (2) + (array - 2) with array , in which case the whole expression would end up as -*array and prints -10 .

    There's no guarantees of this and therefore the code is bad. If you were given the assignment to explain why the code prints -10 , your teacher is incompetent. Not only is it meaningless/harmful to study obfuscation as part of C studies, it is harmful to rely on undefined behavior or expect it to give a certain result.


1) C rather supports negative integer constant expressions . -2 is an integer constant expression, where 2 is an integer constant of type int .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM