Understanding assembly code when converting from unsigned char to long long

Question

Consider code c.c

void f(unsigned char *a, long long *b)
{
    *b = (long long)*a;
}

Compile it with

$ gcc -Og -S c.c

where

$ gcc --version
gcc (MinGW-W64 x86_64-posix-seh, built by Brecht Sanders) 10.2.0

and my machine is a 64-bit Windows 10.

Among other lines, I get the assembly code as follows

01 movzbl  (%rcx), %eax
02 movq    %rax, (%rdx)

My question is: Why isn't the first line written in this way

01 movzbq  (%rcx), %rax

What if the higher 32 bits of %rax originally had some non-zero bits, and were not set to zero after movzbl (%rcx), %eax ? Won't these non-zero bits (if any) get copied to (%rdx) by movq %rax, (%rdx) ?

A follow-up question is: Even the above concern is unneeded, still, why isn't the first line written in this way

01 movzbq  (%rcx), %rax

ie governed by which rule the translation from C to assembly code is done in the given way?

(I have some knowledge with C but am new to assembly code.)

Update: Would like to make some clarification after I read the comments (appreciate all of them). A comment says the function is unnecessary, and I may just do that assignment. That is right. As another comment rightly puts, this is a pared-down example. What I want to understand is simply why the C-to-assembly translation happens this way when casting a unsigned char to long long .

Answer 1

movzbl 1) zero extends to 32 bit ('z'), and 2) zero extends to 64 bit (32 bit operands are implicitly “zero extended”) for %eax .

32-bit instruction movzbl 's encoding is shorter than the 64-bit instruction movzbq 's encoding.

Understanding assembly code when converting from unsigned char to long long

Question

1 answers

solution1
3 ACCPTED 2021-02-08 14:18:51

Understanding assembly code when converting from unsigned char to long long

Question

1 answers

solution1 3 ACCPTED 2021-02-08 14:18:51

solution1
3 ACCPTED 2021-02-08 14:18:51