[英]Conversion from size_t to int
Following this thread ... 跟随这个线程 ...
For this piece of code: 对于这段代码:
#include <stdio.h>
int main(void)
{
int i;
size_t u;
for (i = 0; i < 10; i++) {
u = (size_t)i;
printf("i = %d, u = %zu\n", i, u);
}
return 0;
}
The output in assembly is: 汇编中的输出为:
EDIT : Compiled with -O2 编辑 :用-O2编译
.file "demo.c"
.section .rodata.str1.1,"aMS",@progbits,1
.LC0:
.string "i = %d, u = %zu\n"
.section .text.startup,"ax",@progbits
.p2align 4,,15
.globl main
.type main, @function
main:
.LFB3:
.cfi_startproc
pushq %rbx
.cfi_def_cfa_offset 16
.cfi_offset 3, -16
xorl %ebx, %ebx
.p2align 4,,10
.p2align 3
.L2:
movq %rbx, %rdx
movl %ebx, %esi
xorl %eax, %eax
movl $.LC0, %edi
addq $1, %rbx
call printf
cmpq $10, %rbx
jne .L2
xorl %eax, %eax
popq %rbx
.cfi_def_cfa_offset 8
ret
.cfi_endproc
.LFE3:
.size main, .-main
.ident "GCC: (Debian 4.7.2-5) 4.7.2"
.section .note.GNU-stack,"",@progbits
Is the conversion u = (size_t)i;
是转换u = (size_t)i;
consuming extra cycles? 消耗额外的周期?
Yes, as the code is posted, certainly. 是的,肯定会发布代码。 Your conversion is here: 您的转换在这里:
movl -4(%rbp), %eax
cltq
movq %rax, -16(%rbp)
Of course, this code is unoptimized, so it's not a very fair comparison. 当然,此代码未经过优化,因此它不是一个很公平的比较。 If you compile it with optimization, the compiler may realize that the values are always positive and just do a single move from whatever register holds i
to %rdx
that holds the third argument. 如果您使用优化对其进行编译,则编译器可能会意识到这些值始终为正,并且只需从保存i
任何寄存器向保存第三个参数的%rdx
进行一次移动即可。
Edit: 编辑:
As suspected, there is essentially no overhead in the optimized code. 如所怀疑的,优化的代码基本上没有开销。 In this case, the compiler has converted the loop to count up u
, and derive i
from u
instead of the other way around, so %rbx
is used for the loop, and the value of i
is just using %ebx
, which is the lower 32 bits of %rbx
- so there is no overhead in this example . 在这种情况下,编译器已将循环转换为对u
进行计数,并从u
导出i
,而不是反过来,因此%rbx
用于循环,而i
的值仅使用%ebx
,即%rbx
低32位-因此在此示例中没有开销。 I emphasise this, since there may well be other cases where converting from int
to size_t
will have a penalty. 我强调这一点,因为在其他情况下 ,从int
转换为size_t
会有损失。 It completely depends on the circumstances. 这完全取决于情况。
yes, it does, as it changes the internal representation from 32bit to 64bit. 是的,它可以,因为它将内部表示形式从32位更改为64位。 specifically, 特别,
.L3:
movl -4(%rbp), %eax
cltq
movq %rax, -16(%rbp)
movq -16(%rbp), %rdx
reads i
, performs sign-extension and copying to %rdx
. 读取i
,执行符号扩展并将其复制到%rdx
。 i'm unsure why this value has to pass through the stack - as mats pointed out, this looks like code from a non-noptimizing compiler run. 我不确定为什么这个值必须通过堆栈-正如垫指出,这看起来像是非优化编译器运行的代码。
EDIT 编辑
in the optimized assembly code, the loop counter is maintained as the wider data type. 在优化的汇编代码中,循环计数器将保留为较宽的数据类型。 afair, mov
s between registers don't differ in run-time cycles wrt quad or dword (indeed they don't: see table C-16 in intels pertinent doc , referenced by this SO post . 公平地说,寄存器之间的mov
在运行时周期中没有区别,实际上是quad或dword(实际上它们没有:请参见intel相关文档中的表C-16 , 此SO post引用了该 文档 。
Not sure if this is the actual assignment that's consuming cycles for you i believe this is the assignment thats consuming cycles 不知道这是否是消耗周期的实际作业,我相信这是消耗周期的作业
for example looc at this t1.c 例如在此t1.c的looc
#include <stdio.h>
int main(void)
{
int i;
size_t u;
for (i = 0; i < 10; i++) {
printf("i = %d, u = %zu\n", i, u);
}
return 0;
}
and the assmebly for t1.c 和t1.c的组装
.file "t1.c"
.section .rodata
.LC0:
.string "i = %d, u = %zu\n"
.text
.globl main
.type main, @function
main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
subl $32, %esp
movl $0, 24(%esp)
jmp .L2
.L3:
movl $.LC0, %eax
movl 28(%esp), %edx
movl %edx, 8(%esp)
movl 24(%esp), %edx
movl %edx, 4(%esp)
movl %eax, (%esp)
call printf
addl $1, 24(%esp)
.L2:
cmpl $9, 24(%esp)
jle .L3
movl $0, %eax
leave
ret
.size main, .-main
.ident "GCC: (GNU) 4.4.6 20110731 (Red Hat 4.4.6-3)"
.section .note.GNU-stack,"",@progbits
in the above case no assignment atall for its ok for now 在上述情况下,目前尚无分配
second case t2.c 第二种情况t2.c
#include <stdio.h>
int main(void)
{
int i;
size_t u;
for (i = 0; i < 10; i++) {
i = (size_t) u;
printf("i = %d, u = %zu\n", i, u);
}
return 0;
}
and the subsequent assmebly 以及随后的组装
.file "t2.c"
.section .rodata
.LC0:
.string "i = %d, u = %zu\n"
.text
.globl main
.type main, @function
main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
subl $32, %esp
movl $0, 24(%esp)
jmp .L2
.L3:
movl 28(%esp), %eax
movl %eax, 24(%esp)
movl $.LC0, %eax
movl 28(%esp), %edx
movl %edx, 8(%esp)
movl 24(%esp), %edx
movl %edx, 4(%esp)
movl %eax, (%esp)
call printf
addl $1, 24(%esp)
.L2:
cmpl $9, 24(%esp)
jle .L3
movl $0, %eax
leave
ret
.size main, .-main
.ident "GCC: (GNU) 4.4.6 20110731 (Red Hat 4.4.6-3)"
.section .note.GNU-stack,"",@progbits
Check the statements above 检查上面的陈述
movl 28(%esp), %eax
movl %eax, 24(%esp)
now for the last example t3.c 现在是最后一个例子t3.c
#include <stdio.h>
int main(void)
{
int i;
int u;
for (i = 0; i < 10; i++) {
i = u;
printf("i = %d, u = %zu\n", i, u);
}
return 0;
}
and the subsequent assembly 以及随后的组装
.file "t3.c"
.section .rodata
.LC0:
.string "i = %d, u = %zu\n"
.text
.globl main
.type main, @function
main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
subl $32, %esp
movl $0, 24(%esp)
jmp .L2
.L3:
movl 28(%esp), %eax
movl %eax, 24(%esp)
movl $.LC0, %eax
movl 28(%esp), %edx
movl %edx, 8(%esp)
movl 24(%esp), %edx
movl %edx, 4(%esp)
movl %eax, (%esp)
call printf
addl $1, 24(%esp)
.L2:
cmpl $9, 24(%esp)
jle .L3
movl $0, %eax
leave
ret
.size main, .-main
.ident "GCC: (GNU) 4.4.6 20110731 (Red Hat 4.4.6-3)"
.section .note.GNU-stack,"",@progbits
Now you can observe t2 and t3 and see the difference here, but really varies from arch to arch though 现在您可以观察到t2和t3,并在这里看到了区别,但实际上每个拱形之间的差异很大
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.