简体   繁体   中英

Understanding assembly code when converting from unsigned char to long long

Consider code c.c

void f(unsigned char *a, long long *b)
{
    *b = (long long)*a;
}

Compile it with

$ gcc -Og -S c.c

where

$ gcc --version
gcc (MinGW-W64 x86_64-posix-seh, built by Brecht Sanders) 10.2.0

and my machine is a 64-bit Windows 10.

Among other lines, I get the assembly code as follows

01 movzbl  (%rcx), %eax
02 movq    %rax, (%rdx)

My question is: Why isn't the first line written in this way

01 movzbq  (%rcx), %rax

What if the higher 32 bits of %rax originally had some non-zero bits, and were not set to zero after movzbl (%rcx), %eax ? Won't these non-zero bits (if any) get copied to (%rdx) by movq %rax, (%rdx) ?

A follow-up question is: Even the above concern is unneeded, still, why isn't the first line written in this way

01 movzbq  (%rcx), %rax

ie governed by which rule the translation from C to assembly code is done in the given way?

(I have some knowledge with C but am new to assembly code.)

Update: Would like to make some clarification after I read the comments (appreciate all of them). A comment says the function is unnecessary, and I may just do that assignment. That is right. As another comment rightly puts, this is a pared-down example. What I want to understand is simply why the C-to-assembly translation happens this way when casting a unsigned char to long long .

movzbl 1) zero extends to 32 bit ('z'), and 2) zero extends to 64 bit (32 bit operands are implicitly “zero extended”) for %eax .

32-bit instruction movzbl 's encoding is shorter than the 64-bit instruction movzbq 's encoding.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM