简体   繁体   中英

What does * address(found in printf) mean in assembly?

Disassembling printf doesn't give much info:

(gdb) disas printf
Dump of assembler code for function printf:
0x00401b38 <printf+0>:  jmp    *0x405130
0x00401b3e <printf+6>:  nop
0x00401b3f <printf+7>:  nop
End of assembler dump.


(gdb) disas 0x405130
Dump of assembler code for function _imp__printf:
0x00405130 <_imp__printf+0>:    je     0x405184 <_imp__vfprintf+76>
0x00405132 <_imp__printf+2>:    add    %al,(%eax)

How is it implemented under the hood?

Why disassembling doesn't help?

What does * mean before 0x405130 ?

The * is AT&T assembler syntax for indirect memory reference. Ie

jmp *<addr>

means "jump to the address stored in <addr> ".

It is equivalent to the following Intel syntax:

jmp [addr]

Branch addressing using registers or memory operands must be prefixed by a '*'

Source

Virtually all C compilers provide the source the their runtime libraries - not just open source compilers. Unfortunately, they're often written in rather difficult to follow form and they don't generally come with design rationale documents.

So, a very nice resource for dealing with that problem is PJ Plauger's "The Standard C Library" , which provides not only the source for a library implementation but also has details on how it's designed and the special situations that such a library might have to consider.

At the prices that some of the 'used' versions of the book are being offered, it's a steal and should be on any serious C programmer's bookshelf.

Plauger has similar books targeting the C++ library that I think have similar value:

I'd say disassembling works just fine here, and that printf is implemented 'under the hood' here using vfprintf, which is pretty much what you'd expect. Note that assembler is typically much more verbose than the C, and time consuming to make sense of where you don't have the annotated source. Compiler output is not a great way of teaching yourself assembler either.

As for

What does * mean before 0x405130?

I'm not familiar with gdb's disassembler, but it looks like the jmp *0x405130 is an indirect jump through a pointer. Instead of disassembling what's at 0x405130 you should dump the 4 bytes of memory there. I'd be willing to bet that you'll find another address there, and if you disassemble that location you'll find printf() 's code (how readable that disassembly might be is another story).

In other words, _imp__printf is a pointer to printf() , not printf() itself.


Edit from after more information in the comments below:

A litle poking around indicates that jmp *0x405130 is the GAS/AT&T assembly syntax for jmp [0x405130] instruction when using the Intel assembly syntax.

What makes this curious is that you say that the gdb command x/xw 0x405130 shows that that address contains 0x00005274 (which seems to match up with what you got when you disassembled 0x405130). However, that would mean that jmp [0x405130] would try to jump to address 0x00005274 , which doesn't seem right (and gdb said as much when you tried to disassemble that address.

It's possible that the _imp_printf entry is using some sort of lazy binding technique where the first time execution jumps through 0x405130, it hits the 0x00005274 address which causes the OS to field a trap and fixup the dynamic link. After the fixup, the OS will restart execution with the correct link address in 0x405130. But this is sheer guesswork on my part. I have no idea if the system you're using does anything like this (indeed, I don't even know what system you're running on), but it's technically possible. If something like this is going on, you won't see the correct address in 0x405130 until after the first call to printf() has been made.

I think you'll need to single step through a call to printf() at the assembly level to see what's really going on.


Updated information with a GDB session:

Here's the problem you're running into - you're looking at the process before the system has loaded DLLs and fixed up the linkages to the DLLs. Here's a debugging session of a simple "hello world" program compiled with MinGW debugged with GDB:

C:\temp>\mingw\bin\gdb test.exe
GNU gdb (GDB) 7.1
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "mingw32".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from C:\temp/test.exe...done.

(gdb) disas main
Dump of assembler code for function main:
   0x004012f0 <+0>:     push   %ebp
   0x004012f1 <+1>:     mov    %esp,%ebp
   0x004012f3 <+3>:     sub    $0x8,%esp
   0x004012f6 <+6>:     and    $0xfffffff0,%esp
   0x004012f9 <+9>:     mov    $0x0,%eax
   0x004012fe <+14>:    add    $0xf,%eax
   0x00401301 <+17>:    add    $0xf,%eax
   0x00401304 <+20>:    shr    $0x4,%eax
   0x00401307 <+23>:    shl    $0x4,%eax
   0x0040130a <+26>:    mov    %eax,-0x4(%ebp)
   0x0040130d <+29>:    mov    -0x4(%ebp),%eax
   0x00401310 <+32>:    call   0x401850 <_alloca>
   0x00401315 <+37>:    call   0x4013d0 <__main>
   0x0040131a <+42>:    movl   $0x403000,(%esp)
   0x00401321 <+49>:    call   0x4018b0 <printf>
   0x00401326 <+54>:    mov    $0x0,%eax
   0x0040132b <+59>:    leave
   0x0040132c <+60>:    ret
End of assembler dump.

Note that disassembling printf() leads to a similar indirect jump:

(gdb) disas printf
Dump of assembler code for function printf:
   0x004018b0 <+0>:     jmp    *0x4050f8     ; <<-- indirect jump
   0x004018b6 <+6>:     nop
   0x004018b7 <+7>:     nop
End of assembler dump.

And that the _imp__printf symbiol makes no sense as code...

(gdb) disas 0x4050f8
Dump of assembler code for function _imp__printf:
   0x004050f8 <+0>:     clc                 ; <<-- how can this be printf()?
   0x004050f9 <+1>:     push   %ecx
   0x004050fa <+2>:     add    %al,(%eax)
End of assembler dump.

or as a pointer...

(gdb) x/xw 0x4050f8
0x4050f8 <_imp__printf>:        0x000051f8  ; <<-- 0x000051f8 is an invalid pointer

Now, let's set a breakpoint at main() , and run to it:

(gdb) break main
Breakpoint 1 at 0x40131a: file c:/temp/test.c, line 5.

(gdb) run
Starting program: C:\temp/test.exe
[New Thread 11204.0x2bc8]
Error while mapping shared library sections:
C:\WINDOWS\SysWOW64\ntdll32.dll: No such file or directory.

Breakpoint 1, main () at c:/temp/test.c:5
5           printf( "hello world\n");

printf() looks the same:

(gdb) disas printf
Dump of assembler code for function printf:
   0x004018b0 <+0>:     jmp    *0x4050f8
   0x004018b6 <+6>:     nop
   0x004018b7 <+7>:     nop
End of assembler dump.

but _imp__printf looks different - the dynamic link has now been fixed up:

(gdb) x/xw 0x4050f8
0x4050f8 <_imp__printf>:        0x77bd27c2

And if we disassemble what _imp__printf is now pointing to, it might not be very readable, but clearly it's code now. This is printf() as implemented in MSVCRT.DLL:

(gdb) disas _imp__printf
Dump of assembler code for function printf:
   0x77bd27c2 <+0>:     push   $0x10
   0x77bd27c4 <+2>:     push   $0x77ba4770
   0x77bd27c9 <+7>:     call   0x77bc84c4 <strerror+554>
   0x77bd27ce <+12>:    mov    $0x77bf1cc8,%esi
   0x77bd27d3 <+17>:    push   %esi
   0x77bd27d4 <+18>:    push   $0x1
   0x77bd27d6 <+20>:    call   0x77bcca49 <msvcrt!_lock+4816>
   0x77bd27db <+25>:    pop    %ecx
   0x77bd27dc <+26>:    pop    %ecx
   0x77bd27dd <+27>:    andl   $0x0,-0x4(%ebp)
   0x77bd27e1 <+31>:    push   %esi
   0x77bd27e2 <+32>:    call   0x77bd400d <wscanf+3544>
   0x77bd27e7 <+37>:    mov    %eax,-0x1c(%ebp)
   0x77bd27ea <+40>:    lea    0xc(%ebp),%eax
   0x77bd27ed <+43>:    push   %eax
   0x77bd27ee <+44>:    pushl  0x8(%ebp)
   0x77bd27f1 <+47>:    push   %esi
   0x77bd27f2 <+48>:    call   0x77bd3330 <wscanf+251>
   0x77bd27f7 <+53>:    mov    %eax,-0x20(%ebp)
   0x77bd27fa <+56>:    push   %esi
   0x77bd27fb <+57>:    pushl  -0x1c(%ebp)
   0x77bd27fe <+60>:    call   0x77bd4099 <wscanf+3684>
   0x77bd2803 <+65>:    add    $0x18,%esp
   0x77bd2806 <+68>:    orl    $0xffffffff,-0x4(%ebp)
   0x77bd280a <+72>:    call   0x77bd281d <printf+91>
   0x77bd280f <+77>:    mov    -0x20(%ebp),%eax
   0x77bd2812 <+80>:    call   0x77bc84ff <strerror+613>
   0x77bd2817 <+85>:    ret
   0x77bd2818 <+86>:    mov    $0x77bf1cc8,%esi
   0x77bd281d <+91>:    push   %esi
   0x77bd281e <+92>:    push   $0x1
   0x77bd2820 <+94>:    call   0x77bccab0 <msvcrt!_lock+4919>
   0x77bd2825 <+99>:    pop    %ecx
   0x77bd2826 <+100>:   pop    %ecx
   0x77bd2827 <+101>:   ret
   0x77bd2828 <+102>:   int3
   0x77bd2829 <+103>:   int3
   0x77bd282a <+104>:   int3
   0x77bd282b <+105>:   int3
   0x77bd282c <+106>:   int3
End of assembler dump.

It's probably harder to read than you might hope because I'm not sure if proper symbols are available for it (or whether GDB can properly read those symbols).

However, as I mentioned in another answer , you can get typically get the source for C runtime routines with your compiler, whether open source or not. MinGW doesn't come with the source for MSVDRT.DLL since that's a Windows thing, but you can get the source for it (or something pretty close to it) in a Visual Studio distribution - I think that even the free VC++ Express comes with runtime source (but I might be wrong about that).

printf() is most likely located in a dynamic shared library. The dynamic linker fills a table with the addresses of the imported functions; that's why you have to make that indirect call.

I don't really recall how this works; it's probable that optimizations complicate the process. But you get the idea.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM