简体   繁体   中英

Undefined behavior from pointer math on a C++ array

Why the output of this program is 4 ?

#include <iostream>

int main()
{
    short A[] = {1, 2, 3, 4, 5, 6};
    std::cout << *(short*)((char*)A + 7) << std::endl;
    return 0;
}

From my understanding, on x86 little endian system, where char has 1 byte, and short 2 bytes, the output should be 0x0500 , because the data in array A is as fallow in hex:

01 00 02 00 03 00 04 00 05 00 06 00

We move from the beginning 7 bytes forward, and then read 2 bytes. What I'm missing?

You are violating strict aliasing rules here. You can't just read half-way into an object and pretend it's an object all on its own. You can't invent hypothetical objects using byte offsets like this. GCC is perfectly within its rights to do crazy sh!t like going back in time and murdering Elvis Presley, when you hand it your program.

What you are allowed to do is inspect and manipulate the bytes that make up an arbitrary object, using a char* . Using that privilege:

#include <iostream>
#include <algorithm>

int main()
{
    short A[] = {1, 2, 3, 4, 5, 6};

    short B;
    std::copy(
       (char*)A + 7,
       (char*)A + 7 + sizeof(short),
       (char*)&B
    );
    std::cout << std::showbase << std::hex << B << std::endl;
}

// Output: 0x500

( live demo )

But you can't just "make up" a non-existent object in the original collection.

Furthermore, even if you have a compiler that can be told to ignore this problem (eg with GCC's -fno-strict-aliasing switch), the made-up object is not correctly aligned for any current mainstream architecture. A short cannot legally live at that odd-numbered location in memory , so you doubly can't pretend there is one there. There's just no way to get around how undefined the original code's behaviour is; in fact, if you pass GCC the -fsanitize=undefined switch it will tell you as much.

I'm simplifying a little.

The program has undefined behaviour due to casting an incorrectly aligned pointer to (short*) . This breaks the rules in 6.3.2.3 p6 in C11, which is nothing to do with strict aliasing as claimed in other answers:

A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined.

In [expr.static.cast] p13 C++ says that converting the unaligned char* to short* gives an unspecified pointer value, which might be an invalid pointer, which can't be dereferenced.

The correct way to inspect the bytes is through the char* not by casting back to short* and pretending there is a short at an address where a short cannot live.

This is arguably a bug in GCC.

First, it is to be noted that your code is invoking undefined behavior, due to violation of the rules of strict aliasing.

With that said, here's why I consider it a bug:

  1. The same expression, when first assigned to an intermediate short or short * , causes the expected behavior. It's only when passing the expression directly as a function argument, does the unexpected behavior manifest.

  2. It occurs even when compiled with -O0 -fno-strict-aliasing .

I re-wrote your code in C to eliminate the possibility of any C++ craziness. Your question is was tagged c after all! I added the pshort function to ensure that the variadic nature printf wasn't involved.

#include <stdio.h>

static void pshort(short val)
{
    printf("0x%hx ", val);
}

int main(void)
{
    short A[] = {1, 2, 3, 4, 5, 6};

#define EXP ((short*)((char*)A + 7))

    short *p = EXP;
    short q = *EXP;

    pshort(*p);
    pshort(q);
    pshort(*EXP);
    printf("\n");

    return 0;
}

After compiling with gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2) :

gcc -O0 -fno-strict-aliasing -g -Wall -Werror  endian.c

Output:

0x500 0x500 0x4

It appears that GCC is actually generating different code when the expression is used directly as an argument, even though I'm clearly using the same expression ( EXP ).

Dumping with objdump -Mintel -S --no-show-raw-insn endian :

int main(void)
{
  40054d:   push   rbp
  40054e:   mov    rbp,rsp
  400551:   sub    rsp,0x20
    short A[] = {1, 2, 3, 4, 5, 6};
  400555:   mov    WORD PTR [rbp-0x16],0x1
  40055b:   mov    WORD PTR [rbp-0x14],0x2
  400561:   mov    WORD PTR [rbp-0x12],0x3
  400567:   mov    WORD PTR [rbp-0x10],0x4
  40056d:   mov    WORD PTR [rbp-0xe],0x5
  400573:   mov    WORD PTR [rbp-0xc],0x6

#define EXP ((short*)((char*)A + 7))

    short *p = EXP;
  400579:   lea    rax,[rbp-0x16]             ; [rbp-0x16] is A
  40057d:   add    rax,0x7
  400581:   mov    QWORD PTR [rbp-0x8],rax    ; [rbp-0x08] is p
    short q = *EXP;
  400585:   movzx  eax,WORD PTR [rbp-0xf]     ; [rbp-0xf] is A plus 7 bytes
  400589:   mov    WORD PTR [rbp-0xa],ax      ; [rbp-0xa] is q

    pshort(*p);
  40058d:   mov    rax,QWORD PTR [rbp-0x8]    ; [rbp-0x08] is p
  400591:   movzx  eax,WORD PTR [rax]         ; *p
  400594:   cwde   
  400595:   mov    edi,eax
  400597:   call   400527 <pshort>
    pshort(q);
  40059c:   movsx  eax,WORD PTR [rbp-0xa]      ; [rbp-0xa] is q
  4005a0:   mov    edi,eax
  4005a2:   call   400527 <pshort>
    pshort(*EXP);
  4005a7:   movzx  eax,WORD PTR [rbp-0x10]    ; [rbp-0x10] is A plus 6 bytes ********
  4005ab:   cwde   
  4005ac:   mov    edi,eax
  4005ae:   call   400527 <pshort>
    printf("\n");
  4005b3:   mov    edi,0xa
  4005b8:   call   400430 <putchar@plt>

    return 0;
  4005bd:   mov    eax,0x0
}
  4005c2:   leave  
  4005c3:   ret

  • I get the same result with GCC 4.9.4 and GCC 5.5.0 from Docker hub

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM