为什么GCC会将字复制到返回寄存器而不是字节？

Question

Is there a logical reason GCC (4.4.7) is not moving the byte from a structure into %eax directly, or is it just an optimization oversight? 是否存在逻辑上的原因GCC（4.4.7）没有直接将字节从结构移动到%eax ，还是仅仅是优化疏忽？

Consider the following program: 考虑以下程序：

struct foo { unsigned char x; };
struct bar { unsigned int x; };

int foo (const struct foo *x, int y) { return x->x * y; }
int bar (const struct bar *x, int y) { return x->x * y; }

When compiling with GCC, foo() and bar() differ more substantially than I expected: 使用GCC进行编译时， foo()和bar()差异比我预期的要大得多：

foo:
.LFB0:
        .cfi_startproc
        movzbl  (%rdi), %edx
        movl    %esi, %eax
        imull   %edx, %eax
        ret
        .cfi_endproc

bar:
.LFB1:
        .cfi_startproc
        movl    (%rdi), %eax
        imull   %esi, %eax
        ret
        .cfi_endproc

I expected foo() would be just like bar() , except using a different move instruction. 我希望foo()就像bar() ，除了使用不同的移动指令。

I will note that under clang-500.2.79 , the compiler generates the code I expect for foo() , and foo() and bar() have the same number of instructions (as I had expected for GCC as well, but was wrong). 我会注意到在clang-500.2.79 ，编译器生成我期望的foo()代码，而foo()和bar()具有相同数量的指令（正如我对GCC所期望的那样，但是错了）。

Answer 1

Since you multiply an uchar x and a uint y in the function foo, the compiler must promote uchar x to int first, which the instruction movzbl just does. 由于你在函数foo中乘以uchar x和uint y，编译器必须首先将uchar x提升为int，这是指令movzbl所做的。

See the explanation of movz instructions here. 请在此处查看movz指令的说明。

Afterward I recompiled your code with gcc 4.6.1 and -O3 options, I got assembles as follows: 之后我用gcc 4.6.1和-O3选项重新编译了你的代码，我得到了如下组件：

foo:
.LFB34:
    .cfi_startproc
    movzbl  (%rdi), %eax
    imull   %esi, %eax
    ret 
    .cfi_endproc

bar:
.LFB35:
    .cfi_startproc
    movl    (%rdi), %eax
    imull   %esi, %eax
    ret 
    .cfi_endproc

It doesn't use %edx any more. 它不再使用％edx。

Answer 2

The short answer 简短的回答

Why will GCC copy word into the return register but not byte? 为什么GCC会将字复制到返回寄存器而不是字节？

Because you asked it to return a word not a byte. 因为你要求它返回一个字而不是一个字节。 The compilers did what they were asked based on your code. 编译器根据您的代码完成了他们的要求。 You asked for a size promotion in one case and unsigned to signed in both cases. 您要求在一个案例中进行大小提升，并在两个案例中签名都未签名。 There was more than one way to do that and clang/llvm and gcc happened to vary. 有不止一种方法可以做到这一点，而clang / llvm和gcc恰好有所不同。

Is there a logical reason GCC (4.4.7) is not moving the byte from a structure into %eax directly, or is it just an optimization oversight? 是否存在逻辑上的原因GCC（4.4.7）没有直接将字节从结构移动到％eax，还是仅仅是优化疏忽？

I think based on what we see in the current compilers it was an oversight. 我认为根据我们在当前编译器中看到的内容，这是一个疏忽。 See generated code below. 请参阅下面的生成代码 (-O2 used in each case). （每种情况下使用-O2）。

Interesting experiments related to this question. 有趣的实验与这个问题有关。

clang 铛

0000000000000000 <foo>:
   0:   0f b6 07                movzbl (%rdi),%eax
   3:   0f af c6                imul   %esi,%eax
   6:   c3                      retq   

0000000000000010 <bar>:
  10:   0f af 37                imul   (%rdi),%esi
  13:   89 f0                   mov    %esi,%eax
  15:   c3                      retq

gcc GCC

0000000000000000 <foo>:
   0:   0f b6 07                movzbl (%rdi),%eax
   3:   0f af c6                imul   %esi,%eax
   6:   c3                      retq   

0000000000000010 <bar>:
  10:   8b 07                   mov    (%rdi),%eax
  12:   0f af c6                imul   %esi,%eax
  15:   c3                      retq

They both generated proper code. 他们都生成了正确的代码。 The tiny difference in the number of bytes of instruction could have really gone either way with these small functions on this instruction set. 对于这个指令集上的这些小函数，指令字节数的微小差异实际上可能已经消失了。

Your compiler at the time must not have seen that optimization for some reason. 当时您的编译器由于某种原因不得看到该优化。

mips: MIPS：

00000000 <foo>:
   0:   90820000    lbu v0,0(a0)
   4:   00000000    nop
   8:   00450018    mult    v0,a1
   c:   00001012    mflo    v0
  10:   03e00008    jr  ra
  14:   00000000    nop

00000018 <bar>:
  18:   8c820000    lw  v0,0(a0)
  1c:   00000000    nop
  20:   00a20018    mult    a1,v0
  24:   00001012    mflo    v0
  28:   03e00008    jr  ra
  2c:   00000000    nop

arm 臂

00000000 <foo>:
   0:   e5d00000    ldrb    r0, [r0]
   4:   e0000091    mul r0, r1, r0
   8:   e12fff1e    bx  lr

0000000c <bar>:
   c:   e5900000    ldr r0, [r0]
  10:   e0000091    mul r0, r1, r0
  14:   e12fff1e    bx  lr

No big surprise there like x86 the difference is in the load and how it deals with the other 24 bits then as the code said it promotes the unsigned char or int to signed integer and then multiply and return a signed integer. 没有什么大惊喜像x86差别在于加载以及它如何处理其他24位然后代码表示它将unsigned char或int提升为有符号整数然后乘以并返回有符号整数。

Another equally interesting example to complement your question. 另一个同样有趣的例子来补充你的问题。

struct foo { unsigned char x; };
struct bar { unsigned int x; };

char foo (const struct foo *x, char y) { return x->x * y; }
char bar (const struct bar *x, char y) { return x->x * y; }

clang 铛

0000000000000000 <foo>:
   0:   8a 07                   mov    (%rdi),%al
   2:   40 f6 e6                mul    %sil
   5:   0f be c0                movsbl %al,%eax
   8:   c3                      retq   

0000000000000010 <bar>:
  10:   0f af 37                imul   (%rdi),%esi
  13:   40 0f be c6             movsbl %sil,%eax
  17:   c3                      retq

gcc GCC

0000000000000000 <foo>:
   0:   89 f0                   mov    %esi,%eax
   2:   f6 27                   mulb   (%rdi)
   4:   c3                      retq   

0000000000000010 <bar>:
  10:   89 f0                   mov    %esi,%eax
  12:   f6 27                   mulb   (%rdi)
  14:   c3                      retq

gcc arm gcc arm

00000000 <foo>:
   0:   e5d00000    ldrb    r0, [r0]
   4:   e0010190    mul r1, r0, r1
   8:   e20100ff    and r0, r1, #255    ; 0xff
   c:   e12fff1e    bx  lr

00000010 <bar>:
  10:   e5900000    ldr r0, [r0]
  14:   e0010190    mul r1, r0, r1
  18:   e20100ff    and r0, r1, #255    ; 0xff
  1c:   e12fff1e    bx  lr

mips MIPS

00000000 <foo>:
   0:   90820000    lbu v0,0(a0)
   4:   00052e00    sll a1,a1,0x18
   8:   00052e03    sra a1,a1,0x18
   c:   00a20018    mult    a1,v0
  10:   00001012    mflo    v0
  14:   00021600    sll v0,v0,0x18
  18:   03e00008    jr  ra
  1c:   00021603    sra v0,v0,0x18

00000020 <bar>:
  20:   8c820000    lw  v0,0(a0)
  24:   00052e00    sll a1,a1,0x18
  28:   00052e03    sra a1,a1,0x18
  2c:   00a20018    mult    a1,v0
  30:   00001012    mflo    v0
  34:   00021600    sll v0,v0,0x18
  38:   03e00008    jr  ra
  3c:   00021603    sra v0,v0,0x18

That code in particular punished mips. 该代码特别惩罚了mips。

and lastly 最后

struct foo { unsigned char x; };
struct bar { unsigned int x; };

unsigned char foo (const struct foo *x, unsigned char y) { return x->x * y; }
unsigned char bar (const struct bar *x, unsigned char y) { return x->x * y; }

gcc and clang for x86 produce the same code as above with the non-specified chars, but x86的gcc和clang使用非指定的字符生成与上面相同的代码，但是

arm 臂

00000000 <foo>:
   0:   e5d00000    ldrb    r0, [r0]
   4:   e0010190    mul r1, r0, r1
   8:   e20100ff    and r0, r1, #255    ; 0xff
   c:   e12fff1e    bx  lr

00000010 <bar>:
  10:   e5900000    ldr r0, [r0]
  14:   e0010190    mul r1, r0, r1
  18:   e20100ff    and r0, r1, #255    ; 0xff
  1c:   e12fff1e    bx  lr

mips MIPS

00000000 <foo>:
   0:   90820000    lbu v0,0(a0)
   4:   30a500ff    andi    a1,a1,0xff
   8:   00a20018    mult    a1,v0
   c:   00001012    mflo    v0
  10:   03e00008    jr  ra
  14:   304200ff    andi    v0,v0,0xff

00000018 <bar>:
  18:   8c820000    lw  v0,0(a0)
  1c:   30a500ff    andi    a1,a1,0xff
  20:   00a20018    mult    a1,v0
  24:   00001012    mflo    v0
  28:   03e00008    jr  ra
  2c:   304200ff    andi    v0,v0,0xff

Masking needed because of a combination of calling convention and instruction set. 由于调用约定和指令集的组合，需要屏蔽。 A punishment on both of these instruction sets...You will see this often when using variables whose size do not match the register size for instruction sets like these. 对这两个指令集的惩罚......当使用大小与这些指令集的寄存器大小不匹配的变量时，你会经常看到这一点。 x86 has a much wider array of instruction choices, the costs for x86 is the power (watts) that that additional logic costs. x86具有更广泛的指令选择，x86的成本是额外逻辑成本的功率（瓦特）。

For grins, even if you go way way back, the register sized choice is slightly cheaper. 对于笑容，即使你回过头来，寄存器大小的选择稍微便宜一些。

00000000 <_foo>:
   0:   1166            mov r5, -(sp)
   2:   1185            mov sp, r5
   4:   9f40 0004       movb    *4(r5), r0
   8:   45c0 ff00       bic $-400, r0
   c:   1001            mov r0, r1
   e:   7075 0006       mul 6(r5), r1
  12:   1040            mov r1, r0
  14:   1585            mov (sp)+, r5
  16:   0087            rts pc

00000018 <_bar>:
  18:   1166            mov r5, -(sp)
  1a:   1185            mov sp, r5
  1c:   1d41 0006       mov 6(r5), r1
  20:   707d 0004       mul *4(r5), r1
  24:   1040            mov r1, r0
  26:   1585            mov (sp)+, r5
  28:   0087            rts pc

compiler versions 编译器版本

gcc --version
gcc (Ubuntu/Linaro 4.8.1-10ubuntu9) 4.8.1
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

clang --version
clang version 3.4 (branches/release_34 201060)
Target: x86_64-unknown-linux-gnu
Thread model: posix

arm-none-eabi-gcc --version
arm-none-eabi-gcc (GCC) 4.8.2
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

mips-elf-gcc --version
mips-elf-gcc (GCC) 4.8.2
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

And that last instruction set is an exercise for the reader, there is a bit of a clue in the disassembly... 而最后一个指令集是读者的练习，在拆卸中有一点线索......

为什么GCC会将字复制到返回寄存器而不是字节？

问题描述

2 个解决方案

解决方案1
2 已采纳 2014-03-04 03:23:54

解决方案2
2 2014-03-04 04:55:41

为什么GCC会将字复制到返回寄存器而不是字节？

问题描述

2 个解决方案

解决方案1 2 已采纳 2014-03-04 03:23:54

解决方案2 2 2014-03-04 04:55:41

解决方案1
2 已采纳 2014-03-04 03:23:54

解决方案2
2 2014-03-04 04:55:41