简体   繁体   English

为什么GCC会将字复制到返回寄存器而不是字节?

[英]Why will GCC copy word into the return register but not byte?

Is there a logical reason GCC (4.4.7) is not moving the byte from a structure into %eax directly, or is it just an optimization oversight? 是否存在逻辑上的原因GCC(4.4.7)没有直接将字节从结构移动到%eax ,还是仅仅是优化疏忽?

Consider the following program: 考虑以下程序:

struct foo { unsigned char x; };
struct bar { unsigned int x; };

int foo (const struct foo *x, int y) { return x->x * y; }
int bar (const struct bar *x, int y) { return x->x * y; }

When compiling with GCC, foo() and bar() differ more substantially than I expected: 使用GCC进行编译时, foo()bar()差异比我预期的要大得多:

foo:
.LFB0:
        .cfi_startproc
        movzbl  (%rdi), %edx
        movl    %esi, %eax
        imull   %edx, %eax
        ret
        .cfi_endproc

bar:
.LFB1:
        .cfi_startproc
        movl    (%rdi), %eax
        imull   %esi, %eax
        ret
        .cfi_endproc

I expected foo() would be just like bar() , except using a different move instruction. 我希望foo()就像bar() ,除了使用不同的移动指令。

I will note that under clang-500.2.79 , the compiler generates the code I expect for foo() , and foo() and bar() have the same number of instructions (as I had expected for GCC as well, but was wrong). 我会注意到在clang-500.2.79 ,编译器生成我期望的foo()代码,而foo()bar()具有相同数量的指令(正如我对GCC所期望的那样,但是错了)。

Since you multiply an uchar x and a uint y in the function foo, the compiler must promote uchar x to int first, which the instruction movzbl just does. 由于你在函数foo中乘以uchar x和uint y,编译器必须首先将uchar x提升为int,这是指令movzbl所做的。

See the explanation of movz instructions here. 在此处查看movz指令的说明。

Afterward I recompiled your code with gcc 4.6.1 and -O3 options, I got assembles as follows: 之后我用gcc 4.6.1和-O3选项重新编译了你的代码,我得到了如下组件:

foo:
.LFB34:
    .cfi_startproc
    movzbl  (%rdi), %eax
    imull   %esi, %eax
    ret 
    .cfi_endproc

bar:
.LFB35:
    .cfi_startproc
    movl    (%rdi), %eax
    imull   %esi, %eax
    ret 
    .cfi_endproc

It doesn't use %edx any more. 它不再使用%edx。


The short answer 简短的回答

Why will GCC copy word into the return register but not byte? 为什么GCC会将字复制到返回寄存器而不是字节?

Because you asked it to return a word not a byte. 因为你要求它返回一个字而不是一个字节。 The compilers did what they were asked based on your code. 编译器根据您的代码完成了他们的要求。 You asked for a size promotion in one case and unsigned to signed in both cases. 您要求在一个案例中进行大小提升,并在两个案例中签名都未签名。 There was more than one way to do that and clang/llvm and gcc happened to vary. 有不止一种方法可以做到这一点,而clang / llvm和gcc恰好有所不同。

Is there a logical reason GCC (4.4.7) is not moving the byte from a structure into %eax directly, or is it just an optimization oversight? 是否存在逻辑上的原因GCC(4.4.7)没有直接将字节从结构移动到%eax,还是仅仅是优化疏忽?

I think based on what we see in the current compilers it was an oversight. 我认为根据我们在当前编译器中看到的内容,这是一个疏忽。 See generated code below. 请参阅下面的生成代码 (-O2 used in each case). (每种情况下使用-O2)。


Interesting experiments related to this question. 有趣的实验与这个问题有关。

clang

0000000000000000 <foo>:
   0:   0f b6 07                movzbl (%rdi),%eax
   3:   0f af c6                imul   %esi,%eax
   6:   c3                      retq   

0000000000000010 <bar>:
  10:   0f af 37                imul   (%rdi),%esi
  13:   89 f0                   mov    %esi,%eax
  15:   c3                      retq   

gcc GCC

0000000000000000 <foo>:
   0:   0f b6 07                movzbl (%rdi),%eax
   3:   0f af c6                imul   %esi,%eax
   6:   c3                      retq   

0000000000000010 <bar>:
  10:   8b 07                   mov    (%rdi),%eax
  12:   0f af c6                imul   %esi,%eax
  15:   c3                      retq   

They both generated proper code. 他们都生成了正确的代码。 The tiny difference in the number of bytes of instruction could have really gone either way with these small functions on this instruction set. 对于这个指令集上的这些小函数,指令字节数的微小差异实际上可能已经消失了。

Your compiler at the time must not have seen that optimization for some reason. 当时您的编译器由于某种原因不得看到该优化。

mips: MIPS:

00000000 <foo>:
   0:   90820000    lbu v0,0(a0)
   4:   00000000    nop
   8:   00450018    mult    v0,a1
   c:   00001012    mflo    v0
  10:   03e00008    jr  ra
  14:   00000000    nop

00000018 <bar>:
  18:   8c820000    lw  v0,0(a0)
  1c:   00000000    nop
  20:   00a20018    mult    a1,v0
  24:   00001012    mflo    v0
  28:   03e00008    jr  ra
  2c:   00000000    nop

arm

00000000 <foo>:
   0:   e5d00000    ldrb    r0, [r0]
   4:   e0000091    mul r0, r1, r0
   8:   e12fff1e    bx  lr

0000000c <bar>:
   c:   e5900000    ldr r0, [r0]
  10:   e0000091    mul r0, r1, r0
  14:   e12fff1e    bx  lr

No big surprise there like x86 the difference is in the load and how it deals with the other 24 bits then as the code said it promotes the unsigned char or int to signed integer and then multiply and return a signed integer. 没有什么大惊喜像x86差别在于加载以及它如何处理其他24位然后代码表示它将unsigned char或int提升为有符号整数然后乘以并返回有符号整数。

Another equally interesting example to complement your question. 另一个同样有趣的例子来补充你的问题。

struct foo { unsigned char x; };
struct bar { unsigned int x; };

char foo (const struct foo *x, char y) { return x->x * y; }
char bar (const struct bar *x, char y) { return x->x * y; }

clang

0000000000000000 <foo>:
   0:   8a 07                   mov    (%rdi),%al
   2:   40 f6 e6                mul    %sil
   5:   0f be c0                movsbl %al,%eax
   8:   c3                      retq   

0000000000000010 <bar>:
  10:   0f af 37                imul   (%rdi),%esi
  13:   40 0f be c6             movsbl %sil,%eax
  17:   c3                      retq   

gcc GCC

0000000000000000 <foo>:
   0:   89 f0                   mov    %esi,%eax
   2:   f6 27                   mulb   (%rdi)
   4:   c3                      retq   

0000000000000010 <bar>:
  10:   89 f0                   mov    %esi,%eax
  12:   f6 27                   mulb   (%rdi)
  14:   c3                      retq   

gcc arm gcc arm

00000000 <foo>:
   0:   e5d00000    ldrb    r0, [r0]
   4:   e0010190    mul r1, r0, r1
   8:   e20100ff    and r0, r1, #255    ; 0xff
   c:   e12fff1e    bx  lr

00000010 <bar>:
  10:   e5900000    ldr r0, [r0]
  14:   e0010190    mul r1, r0, r1
  18:   e20100ff    and r0, r1, #255    ; 0xff
  1c:   e12fff1e    bx  lr

mips MIPS

00000000 <foo>:
   0:   90820000    lbu v0,0(a0)
   4:   00052e00    sll a1,a1,0x18
   8:   00052e03    sra a1,a1,0x18
   c:   00a20018    mult    a1,v0
  10:   00001012    mflo    v0
  14:   00021600    sll v0,v0,0x18
  18:   03e00008    jr  ra
  1c:   00021603    sra v0,v0,0x18

00000020 <bar>:
  20:   8c820000    lw  v0,0(a0)
  24:   00052e00    sll a1,a1,0x18
  28:   00052e03    sra a1,a1,0x18
  2c:   00a20018    mult    a1,v0
  30:   00001012    mflo    v0
  34:   00021600    sll v0,v0,0x18
  38:   03e00008    jr  ra
  3c:   00021603    sra v0,v0,0x18

That code in particular punished mips. 该代码特别惩罚了mips。

and lastly 最后

struct foo { unsigned char x; };
struct bar { unsigned int x; };

unsigned char foo (const struct foo *x, unsigned char y) { return x->x * y; }
unsigned char bar (const struct bar *x, unsigned char y) { return x->x * y; }

gcc and clang for x86 produce the same code as above with the non-specified chars, but x86的gcc和clang使用非指定的字符生成与上面相同的代码,但是

arm

00000000 <foo>:
   0:   e5d00000    ldrb    r0, [r0]
   4:   e0010190    mul r1, r0, r1
   8:   e20100ff    and r0, r1, #255    ; 0xff
   c:   e12fff1e    bx  lr

00000010 <bar>:
  10:   e5900000    ldr r0, [r0]
  14:   e0010190    mul r1, r0, r1
  18:   e20100ff    and r0, r1, #255    ; 0xff
  1c:   e12fff1e    bx  lr

mips MIPS

00000000 <foo>:
   0:   90820000    lbu v0,0(a0)
   4:   30a500ff    andi    a1,a1,0xff
   8:   00a20018    mult    a1,v0
   c:   00001012    mflo    v0
  10:   03e00008    jr  ra
  14:   304200ff    andi    v0,v0,0xff

00000018 <bar>:
  18:   8c820000    lw  v0,0(a0)
  1c:   30a500ff    andi    a1,a1,0xff
  20:   00a20018    mult    a1,v0
  24:   00001012    mflo    v0
  28:   03e00008    jr  ra
  2c:   304200ff    andi    v0,v0,0xff

Masking needed because of a combination of calling convention and instruction set. 由于调用约定和指令集的组合,需要屏蔽。 A punishment on both of these instruction sets...You will see this often when using variables whose size do not match the register size for instruction sets like these. 对这两个指令集的惩罚......当使用大小与这些指令集的寄存器大小不匹配的变量时,你会经常看到这一点。 x86 has a much wider array of instruction choices, the costs for x86 is the power (watts) that that additional logic costs. x86具有更广泛的指令选择,x86的成本是额外逻辑成本的功率(瓦​​特)。

For grins, even if you go way way back, the register sized choice is slightly cheaper. 对于笑容,即使你回过头来,寄存器大小的选择稍微便宜一些。

00000000 <_foo>:
   0:   1166            mov r5, -(sp)
   2:   1185            mov sp, r5
   4:   9f40 0004       movb    *4(r5), r0
   8:   45c0 ff00       bic $-400, r0
   c:   1001            mov r0, r1
   e:   7075 0006       mul 6(r5), r1
  12:   1040            mov r1, r0
  14:   1585            mov (sp)+, r5
  16:   0087            rts pc

00000018 <_bar>:
  18:   1166            mov r5, -(sp)
  1a:   1185            mov sp, r5
  1c:   1d41 0006       mov 6(r5), r1
  20:   707d 0004       mul *4(r5), r1
  24:   1040            mov r1, r0
  26:   1585            mov (sp)+, r5
  28:   0087            rts pc

compiler versions 编译器版本

gcc --version
gcc (Ubuntu/Linaro 4.8.1-10ubuntu9) 4.8.1
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

clang --version
clang version 3.4 (branches/release_34 201060)
Target: x86_64-unknown-linux-gnu
Thread model: posix

arm-none-eabi-gcc --version
arm-none-eabi-gcc (GCC) 4.8.2
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

mips-elf-gcc --version
mips-elf-gcc (GCC) 4.8.2
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

And that last instruction set is an exercise for the reader, there is a bit of a clue in the disassembly... 而最后一个指令集是读者的练习,在拆卸中有一点线索......

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 gcc总是可以生成单词访问权限来注册吗? - can gcc always generate word access to register? 将C变量的内容复制到寄存器(GCC)中 - Copy content of C variable into a register (GCC) 从用户空间读取一个字(2 字节)而不提供寄存器地址 - Read a word (2 byte) without providing a register address from userspace 如何阻止GCC优化这个逐字节复制到memcpy调用? - How do I stop GCC from optimizing this byte-for-byte copy into a memcpy call? 为什么gcc忽略对一个int变量的“注册”请求,而对另一个int变量接受 - Why is gcc ignoring 'register' request for one int variable and accepting it for the other 与gcc相比,为什么clang与寄存器变量表现得很奇怪? - Why does clang behave weirdly with register variables compared to gcc? 汇编:访问功能中同一寄存器的四字,双字和字节数 - Assembly: access quad-word, double word, and byte quantity of same register in function 为什么gcc发送代码与ARM指令集的2字节边界对齐? - Why is gcc emmiting code aligned to a 2 byte boundary for the ARM instruction set? 为什么gcc将8字节格式的char类型传递给函数汇编 - Why does gcc pass char type in 8 byte format to function assembly 为什么这个结构的内存副本没有按预期将字节复制到字节流? - Why does this memory copy of a struct not copy bytes to byte stream as expected?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM