简体   繁体   中英

gdb in Windows: different behaviour when debugging compiled C and C++ code

I've noticed a strange behaviour of GDB 7.5 on Windows. Consider the following C program:

int foo(void){
    int i = 5;
    return i;
}

int main(int argc, char** argv){
    foo();
    return 0;
}

When compiled as either Classic C or C++, the GDB disass foo command gives the same assembly code, as follows:

Dump of assembler code for function foo:
0x00401954 <+0>:     push   %ebp
0x00401955 <+1>:     mov    %esp,%ebp
0x00401957 <+3>:     sub    $0x10,%esp
0x0040195a <+6>:     movl   $0x5,-0x4(%ebp)
0x00401961 <+13>:    mov    -0x4(%ebp),%eax
0x00401964 <+16>:    leave
0x00401965 <+17>:    ret
End of assembler dump.

However, after inserting a breakpoint at the "leave" command, like so: br *0x00401964 , running the code up to that line, and attempting to print out the variable i, the executables produced by compiling it as C and C++ behaves differently. The C executable works as expected and prints out $i = 5 , while with the C++ executable GDB chokes up and says "no symbol i in current context".

So just out of curiosity I'd like to know if this is a GDB bug or feature? Or is the compiler (GCC) doing something subtly different so that there's something happening between the lines? Thanks.

EDIT: Well, I don't think it's true the compiler removed the function completely, because breaking at the line before "leave" and printing the value of i does work.

This is neither bug/feature nor a side effect of compiler optimization. The disassembly clearly is the output of a non-optmized build ( i is written to the stack in foo+6 and reread from stack one step later in foo+13 ).

While the assembly output of C and C++ is the same in this case, the debug symbol output however is slightly different. The scope of i is more limited in C++. I can only speculate for the reasons. I would guess that this is related to the fact that scoping is more complex in C++ (think of constructors, destructors, exception) and so the C++ part of gcc is stricter on scopes than the C part of gcc.

Details

(I checked everything on a 32-bit build but on a 64-bit Linux with gcc 4.8 and gdb 7.6. While some details will differ on Windows I expect the general mechanics to be the same)

Note that addresses differ in my case.

(gdb) disas foo
Dump of assembler code for function foo:
   0x080483ed <+0>:     push   %ebp
   0x080483ee <+1>:     mov    %esp,%ebp
   0x080483f0 <+3>:     sub    $0x10,%esp
   0x080483f3 <+6>:     movl   $0x5,-0x4(%ebp)
   0x080483fa <+13>:    mov    -0x4(%ebp),%eax
   0x080483fd <+16>:    leave  
   0x080483fe <+17>:    ret    
End of assembler dump.

Technically, foo+0 and foo+1 are the function prologue, foo+3 to foo+13 is the function body, and foo+16 and foo+17 is the function epilogue. So only foo+3 to foo+13 represent the code between { and } . I would say that the C++ version is more correct in saying that i is out of scope before and after the function body.

To see that this is really a matter of debug symbols you can dump out gdb's internals of the debug structures with maintenance print symbols output_file_on_disk . For C it looks like:

block #000, object at 0x1847710, 1 syms/buckets in 0x80483ed..0x804840e
 int foo(); block object 0x18470d0, 0x80483ed..0x80483ff
 int main(int, char **); block object 0x18475d0, 0x80483ff..0x804840e section .text
  block #001, object at 0x18476a0 under 0x1847710, 1 syms/buckets in 0x80483ed..0x804840e
   typedef int int; 
   typedef char char; 
    , object at 0x18470d0 under 0x18476a0, 1 syms/buckets in 
     ; computed at runtime
    block #003, object at 0x18475d0 under 0x18476a0, 2 syms/buckets in 0x80483ff..0x804840e, function main
     int argc; computed at runtime
     char **argv; computed at runtime

While this is C++

block #000, object at 0x1a3c790, 1 syms/buckets in 0x80483ed..0x804840e
 int foo(); block object 0x1a3c0c0, 0x80483ed..0x80483ff
 int main(int, char**); block object 0x1a3c640, 0x80483ff..0x804840e section .text
  block #001, object at 0x1a3c720 under 0x1a3c790, 1 syms/buckets in 0x80483ed..0x804840e
   typedef int int; 
   typedef char char; 
    , object at 0x1a3c0c0 under 0x1a3c720, 0 syms/buckets in 
      , object at 0x1a3c050 under 0x1a3c0c0, 1 syms/buckets in 
       ; computed at runtime
    block #004, object at 0x1a3c640 under 0x1a3c720, 2 syms/buckets in 0x80483ff..0x804840e, function main(int, char**)
     int argc; computed at runtime
     char **argv; computed at runtime

So the debug symbols for the C++ code distinguish between the whole function (block #002) and the scope of the function body (block #003). This results in your observations.

(And to see that this is really not gdb just handling something wrong you can even analyze the binary with objdump on Linux or dumpbin on Windows. I did it on Linux and indeed it's the DWARF debug symbols that are different :-) )

It's not really a bug or a feature. The compiler is permitted to substitute functionally-equivalent code and generally does so if it can find a better way to do things. The example code is equivalent to doing nothing at all, so the compiler is free to remove it. This leaves the debugger with nothing to debug, which is good since debugging code that does nothing would be a waste of time anyway.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM