[英]Why will GCC copy word into the return register but not byte?
Is there a logical reason GCC (4.4.7) is not moving the byte from a structure into %eax
directly, or is it just an optimization oversight? 是否存在逻辑上的原因GCC(4.4.7)没有直接将字节从结构移动到
%eax
,还是仅仅是优化疏忽?
Consider the following program: 考虑以下程序:
struct foo { unsigned char x; };
struct bar { unsigned int x; };
int foo (const struct foo *x, int y) { return x->x * y; }
int bar (const struct bar *x, int y) { return x->x * y; }
When compiling with GCC, foo()
and bar()
differ more substantially than I expected: 使用GCC进行编译时,
foo()
和bar()
差异比我预期的要大得多:
foo:
.LFB0:
.cfi_startproc
movzbl (%rdi), %edx
movl %esi, %eax
imull %edx, %eax
ret
.cfi_endproc
bar:
.LFB1:
.cfi_startproc
movl (%rdi), %eax
imull %esi, %eax
ret
.cfi_endproc
I expected foo()
would be just like bar()
, except using a different move instruction. 我希望
foo()
就像bar()
,除了使用不同的移动指令。
I will note that under clang-500.2.79
, the compiler generates the code I expect for foo()
, and foo()
and bar()
have the same number of instructions (as I had expected for GCC as well, but was wrong). 我会注意到在
clang-500.2.79
,编译器生成我期望的foo()
代码,而foo()
和bar()
具有相同数量的指令(正如我对GCC所期望的那样,但是错了)。
Since you multiply an uchar x and a uint y in the function foo, the compiler must promote uchar x to int first, which the instruction movzbl just does. 由于你在函数foo中乘以uchar x和uint y,编译器必须首先将uchar x提升为int,这是指令movzbl所做的。
See the explanation of movz instructions here. 请在此处查看movz指令的说明。
Afterward I recompiled your code with gcc 4.6.1 and -O3 options, I got assembles as follows: 之后我用gcc 4.6.1和-O3选项重新编译了你的代码,我得到了如下组件:
foo:
.LFB34:
.cfi_startproc
movzbl (%rdi), %eax
imull %esi, %eax
ret
.cfi_endproc
bar:
.LFB35:
.cfi_startproc
movl (%rdi), %eax
imull %esi, %eax
ret
.cfi_endproc
It doesn't use %edx any more. 它不再使用%edx。
The short answer 简短的回答
Why will GCC copy word into the return register but not byte?
为什么GCC会将字复制到返回寄存器而不是字节?
Because you asked it to return a word not a byte. 因为你要求它返回一个字而不是一个字节。 The compilers did what they were asked based on your code.
编译器根据您的代码完成了他们的要求。 You asked for a size promotion in one case and unsigned to signed in both cases.
您要求在一个案例中进行大小提升,并在两个案例中签名都未签名。 There was more than one way to do that and clang/llvm and gcc happened to vary.
有不止一种方法可以做到这一点,而clang / llvm和gcc恰好有所不同。
Is there a logical reason GCC (4.4.7) is not moving the byte from a structure into %eax directly, or is it just an optimization oversight?
是否存在逻辑上的原因GCC(4.4.7)没有直接将字节从结构移动到%eax,还是仅仅是优化疏忽?
I think based on what we see in the current compilers it was an oversight. 我认为根据我们在当前编译器中看到的内容,这是一个疏忽。 See generated code below.
请参阅下面的生成代码 (-O2 used in each case).
(每种情况下使用-O2)。
Interesting experiments related to this question. 有趣的实验与这个问题有关。
clang 铛
0000000000000000 <foo>:
0: 0f b6 07 movzbl (%rdi),%eax
3: 0f af c6 imul %esi,%eax
6: c3 retq
0000000000000010 <bar>:
10: 0f af 37 imul (%rdi),%esi
13: 89 f0 mov %esi,%eax
15: c3 retq
gcc GCC
0000000000000000 <foo>:
0: 0f b6 07 movzbl (%rdi),%eax
3: 0f af c6 imul %esi,%eax
6: c3 retq
0000000000000010 <bar>:
10: 8b 07 mov (%rdi),%eax
12: 0f af c6 imul %esi,%eax
15: c3 retq
They both generated proper code. 他们都生成了正确的代码。 The tiny difference in the number of bytes of instruction could have really gone either way with these small functions on this instruction set.
对于这个指令集上的这些小函数,指令字节数的微小差异实际上可能已经消失了。
Your compiler at the time must not have seen that optimization for some reason. 当时您的编译器由于某种原因不得看到该优化。
mips: MIPS:
00000000 <foo>:
0: 90820000 lbu v0,0(a0)
4: 00000000 nop
8: 00450018 mult v0,a1
c: 00001012 mflo v0
10: 03e00008 jr ra
14: 00000000 nop
00000018 <bar>:
18: 8c820000 lw v0,0(a0)
1c: 00000000 nop
20: 00a20018 mult a1,v0
24: 00001012 mflo v0
28: 03e00008 jr ra
2c: 00000000 nop
arm 臂
00000000 <foo>:
0: e5d00000 ldrb r0, [r0]
4: e0000091 mul r0, r1, r0
8: e12fff1e bx lr
0000000c <bar>:
c: e5900000 ldr r0, [r0]
10: e0000091 mul r0, r1, r0
14: e12fff1e bx lr
No big surprise there like x86 the difference is in the load and how it deals with the other 24 bits then as the code said it promotes the unsigned char or int to signed integer and then multiply and return a signed integer. 没有什么大惊喜像x86差别在于加载以及它如何处理其他24位然后代码表示它将unsigned char或int提升为有符号整数然后乘以并返回有符号整数。
Another equally interesting example to complement your question. 另一个同样有趣的例子来补充你的问题。
struct foo { unsigned char x; };
struct bar { unsigned int x; };
char foo (const struct foo *x, char y) { return x->x * y; }
char bar (const struct bar *x, char y) { return x->x * y; }
clang 铛
0000000000000000 <foo>:
0: 8a 07 mov (%rdi),%al
2: 40 f6 e6 mul %sil
5: 0f be c0 movsbl %al,%eax
8: c3 retq
0000000000000010 <bar>:
10: 0f af 37 imul (%rdi),%esi
13: 40 0f be c6 movsbl %sil,%eax
17: c3 retq
gcc GCC
0000000000000000 <foo>:
0: 89 f0 mov %esi,%eax
2: f6 27 mulb (%rdi)
4: c3 retq
0000000000000010 <bar>:
10: 89 f0 mov %esi,%eax
12: f6 27 mulb (%rdi)
14: c3 retq
gcc arm gcc arm
00000000 <foo>:
0: e5d00000 ldrb r0, [r0]
4: e0010190 mul r1, r0, r1
8: e20100ff and r0, r1, #255 ; 0xff
c: e12fff1e bx lr
00000010 <bar>:
10: e5900000 ldr r0, [r0]
14: e0010190 mul r1, r0, r1
18: e20100ff and r0, r1, #255 ; 0xff
1c: e12fff1e bx lr
mips MIPS
00000000 <foo>:
0: 90820000 lbu v0,0(a0)
4: 00052e00 sll a1,a1,0x18
8: 00052e03 sra a1,a1,0x18
c: 00a20018 mult a1,v0
10: 00001012 mflo v0
14: 00021600 sll v0,v0,0x18
18: 03e00008 jr ra
1c: 00021603 sra v0,v0,0x18
00000020 <bar>:
20: 8c820000 lw v0,0(a0)
24: 00052e00 sll a1,a1,0x18
28: 00052e03 sra a1,a1,0x18
2c: 00a20018 mult a1,v0
30: 00001012 mflo v0
34: 00021600 sll v0,v0,0x18
38: 03e00008 jr ra
3c: 00021603 sra v0,v0,0x18
That code in particular punished mips. 该代码特别惩罚了mips。
and lastly 最后
struct foo { unsigned char x; };
struct bar { unsigned int x; };
unsigned char foo (const struct foo *x, unsigned char y) { return x->x * y; }
unsigned char bar (const struct bar *x, unsigned char y) { return x->x * y; }
gcc and clang for x86 produce the same code as above with the non-specified chars, but x86的gcc和clang使用非指定的字符生成与上面相同的代码,但是
arm 臂
00000000 <foo>:
0: e5d00000 ldrb r0, [r0]
4: e0010190 mul r1, r0, r1
8: e20100ff and r0, r1, #255 ; 0xff
c: e12fff1e bx lr
00000010 <bar>:
10: e5900000 ldr r0, [r0]
14: e0010190 mul r1, r0, r1
18: e20100ff and r0, r1, #255 ; 0xff
1c: e12fff1e bx lr
mips MIPS
00000000 <foo>:
0: 90820000 lbu v0,0(a0)
4: 30a500ff andi a1,a1,0xff
8: 00a20018 mult a1,v0
c: 00001012 mflo v0
10: 03e00008 jr ra
14: 304200ff andi v0,v0,0xff
00000018 <bar>:
18: 8c820000 lw v0,0(a0)
1c: 30a500ff andi a1,a1,0xff
20: 00a20018 mult a1,v0
24: 00001012 mflo v0
28: 03e00008 jr ra
2c: 304200ff andi v0,v0,0xff
Masking needed because of a combination of calling convention and instruction set. 由于调用约定和指令集的组合,需要屏蔽。 A punishment on both of these instruction sets...You will see this often when using variables whose size do not match the register size for instruction sets like these.
对这两个指令集的惩罚......当使用大小与这些指令集的寄存器大小不匹配的变量时,你会经常看到这一点。 x86 has a much wider array of instruction choices, the costs for x86 is the power (watts) that that additional logic costs.
x86具有更广泛的指令选择,x86的成本是额外逻辑成本的功率(瓦特)。
For grins, even if you go way way back, the register sized choice is slightly cheaper. 对于笑容,即使你回过头来,寄存器大小的选择稍微便宜一些。
00000000 <_foo>:
0: 1166 mov r5, -(sp)
2: 1185 mov sp, r5
4: 9f40 0004 movb *4(r5), r0
8: 45c0 ff00 bic $-400, r0
c: 1001 mov r0, r1
e: 7075 0006 mul 6(r5), r1
12: 1040 mov r1, r0
14: 1585 mov (sp)+, r5
16: 0087 rts pc
00000018 <_bar>:
18: 1166 mov r5, -(sp)
1a: 1185 mov sp, r5
1c: 1d41 0006 mov 6(r5), r1
20: 707d 0004 mul *4(r5), r1
24: 1040 mov r1, r0
26: 1585 mov (sp)+, r5
28: 0087 rts pc
compiler versions 编译器版本
gcc --version
gcc (Ubuntu/Linaro 4.8.1-10ubuntu9) 4.8.1
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
clang --version
clang version 3.4 (branches/release_34 201060)
Target: x86_64-unknown-linux-gnu
Thread model: posix
arm-none-eabi-gcc --version
arm-none-eabi-gcc (GCC) 4.8.2
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
mips-elf-gcc --version
mips-elf-gcc (GCC) 4.8.2
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
And that last instruction set is an exercise for the reader, there is a bit of a clue in the disassembly... 而最后一个指令集是读者的练习,在拆卸中有一点线索......
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.