I was given a function in assembly which basically converted uppercase letters to lowercase letters. Here is some of the assembly,
Q1:
pushq %rbp
movq %rsp, %rbp
subq $24, %rsp
movq %rdi, -24(%rbp)
movl $0, -4(%rbp)
movl $0. -8%(%rbp)
jmp .L2
L2:
movl -4(%rbp) %edx
movq -24(%rbp), %rax
addq %rdx, %rax
movzbl (%rax), %eax
testb %al, %al
jne .L4
...
Much of the rest is repetitive but L2 is what really is confusing me. This is my logic so far: We store param1 into -24(%rbp). We create local1 and local2, set them both to 0 and then jump to L2. I move local1 into %edx, param1 into %rax. Now this is where things get confusing for me, I was told the following line, addq ended up in local1 being a pointer to param1. I just reasoned add local1 + param1 and store them into %rax. How is that possible?
Next is, movzbl. From my understanding we dereference %rax so we get something like eax = (int) rax.
I was also told to think of it as converting a char to int. Which one is true, how do I know that I'm typecasting? What about if %rax didn't have parentheses around it? Is it an int because it's 4 bytes and %eax is a 32 bit register. Thank you in advance for your help, I'm kind of lost here....
local1
is not a pointer, it's an index (a counter). That code is doing something like:
void toupper(char* text)
{
int i = 0; /* at rbp-4 */
int j = 0; /* unused, at rbp-8 */
int ch; /* in eax */
while((ch = *(text + i)) != 0)
{
...
}
}
Note that in C pointer arithmetic *(text + i)
is of course equivalent to text[i]
.
Yes, the movzbl
is converting an unsigned char
to an int
you can see that from the instruction name itself: MOV e Z ero extended B yte to L ong.
The parentheses denote pointer dereferencing.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.