I'am trying to understand how a C program looks like at assembly level so i run gdb and used disassemble on main and get_input. The program is short so that i can follow it better. There are 2 lines that i don't understand. First on in main() is:
0x00000000004005a3 <+4>: mov $0x0,%eax
We save the old value of rbp and save current value of rsp to rbp. What is the purpose of that instruction?
The other in get_input() is:
000000000400581 <+4>: sub $0x10,%rsp
Here too we start by saving old value of rbp, by pushing it to the stack. Then giving rbp the current value of rsp. Then 16 bytes are subtracted from rsp. I understand this is space allocated but why is it 16 bytes and not 8 bytes? I made the buffer 8 bytes only, what are the purpose of the other 8 bytes?
#include <stdio.h>
void get_input()
{
char buffer[8];
gets(buffer);
puts(buffer);
}
int main()
{
get_input();
return 0;
}
Dump of assembler code for function main:
0x000000000040059f <+0>: push %rbp
0x00000000004005a0 <+1>: mov %rsp,%rbp
0x00000000004005a3 <+4>: mov $0x0,%eax
0x00000000004005a8 <+9>: callq 0x40057d <get_input>
0x00000000004005ad <+14>: mov $0x0,%eax
0x00000000004005b2 <+19>: pop %rbp
0x00000000004005b3 <+20>: retq
End of assembler dump.
Dump of assembler code for function get_input:
0x000000000040057d <+0>: push %rbp
0x000000000040057e <+1>: mov %rsp,%rbp
0x0000000000400581 <+4>: sub $0x10,%rsp
0x0000000000400585 <+8>: lea -0x10(%rbp),%rax
0x0000000000400589 <+12>: mov %rax,%rdi
0x000000000040058c <+15>: callq 0x400480 <gets@plt>
0x0000000000400591 <+20>: lea -0x10(%rbp),%rax
0x0000000000400595 <+24>: mov %rax,%rdi
0x0000000000400598 <+27>: callq 0x400450 <puts@plt>
0x000000000040059d <+32>: leaveq
0x000000000040059e <+33>: retq
For main()
...
0x000000000040059f <+0>: push %rbp
Push %RBP
's value onto the stack.
0x00000000004005a0 <+1>: mov %rsp,%rbp
Copy %RSP
's value into %RBP
(create a new stack frame).
0x00000000004005a3 <+4>: mov $0x0,%eax
Move the immediate value 0x0
into %EAX
. That is, it zeroes %EAX
. As you're in 64-bit mode, this also clears all of %RAX
.
0x00000000004005a8 <+9>: callq 0x40057d <get_input>
Push %RIP
's value (undoable directly), then jump to label/function get_input()
.
0x00000000004005ad <+14>: mov $0x0,%eax
According to the AMD64 System V ABI , a function's return value is stored in %RAX
(not taking into account floating point and large structures). It also says that there are two groups of registers: caller-saved and callee-saved. When you call a function, you can't expected caller-saved registers to remain the same, you must save them yourself in the stack if necessary. Likewise, a function that gets called must preserve callee-saved registers if it uses them. The caller-saved registers are %RAX
, %RDI
, %RSI
, %RDX
, %RCX
, %R8
, %R9
, %R10
, and %R11
. The callee-saved registers are %RBX
, %RSP
, %RBP
, %R12
, %R13
, %R14
, and %R15
.
Now, as main()
apparently performs return 0
, it must return that 0
in %RAX
, right? However, two things should be taken into account. Firstly, in the AMD64 System V ABI, sizeof(int) == 4
. %RAX
is 8 bytes wide, but %EAX
is 4 bytes wide, so %EAX
should be used for manipulating int
-wide stuff, such as main()
's return value. Secondly, %EAX
is part of %RAX
, and %RAX
is caller-saved, thus we can't rely on its value after a call. So, we perform MOV $0x0, %EAX
in order to set the function's return value to zero.
0x00000000004005b2 <+19>: pop %rbp
Restore main()
's caller's %RBP
, that is, destroy main()
's stack frame.
0x00000000004005b3 <+20>: retq
Return from main()
with a return value of 0
.
Then, we have get_input()
...
0x000000000040057d <+0>: push %rbp
Push %RBP
's value onto the stack.
0x000000000040057e <+1>: mov %rsp,%rbp
Copy %RSP
's value into %RBP
(create a new stack frame).
0x0000000000400581 <+4>: sub $0x10,%rsp
Subtract 16 from %RSP
(reserve 16 bytes of temporary storage for the current frame).
0x0000000000400585 <+8>: lea -0x10(%rbp),%rax
Load the effective address -0x10(%RBP)
into %RAX
. That is, it loads into %RAX
the result of subtracting 16 from %RBP
's value. This means that %RAX
now points to the first byte of local temporary storage.
0x0000000000400589 <+12>: mov %rax,%rdi
According to the ABI, a function's first argument is given on %RDI
, the second on %RSI
, etc... In this case, %RAX
's value is given as the first argument to the to-be-called function.
0x000000000040058c <+15>: callq 0x400480 <gets@plt>
Call function gets()
.
0x0000000000400591 <+20>: lea -0x10(%rbp),%rax
The same as above.
0x0000000000400595 <+24>: mov %rax,%rdi
Pass %RAX
as the first argument.
0x0000000000400598 <+27>: callq 0x400450 <puts@plt>
Call function puts()
.
0x000000000040059d <+32>: leaveq
Equivalent to MOV %RBP, %RSP
then POP %RBP
, that is, destroys the stack frame.
0x000000000040059e <+33>: retq
Return from function get_input()
without a proper return value.
Now...
MOV $0x0, %EAX
What is the purpose of that instruction?
The second instance of that instruction is quite important, as it sets the return value of main()
. However, the first one is actually redundant. You probably have optimizations disabled on your compiler.
Then 16 bytes are subtracted from rsp. I understand this is space allocated but why is it 16 bytes and not 8 bytes? I made the buffer 8 bytes only, what are the purpose of the other 8 bytes?
The ABI requires that %RSP
shall be positioned on a 16-byte boundary before each function call. BTW, you should get away from statically-sized buffers and gets()
.
The first instruction, mov $0x0, %eax
, moves a zero into EAX in order to set the return code.
The second instruction, sub $0x10,%rsp
is allocating memory and aligning the stack for system calls. The calling standard requires 16 byte alignment, not 8.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.