[英]How to generate a sse4.2 popcnt machine instruction
Using the c program: 使用c程序:
int main(int argc , char** argv)
{
return __builtin_popcountll(0xf0f0f0f0f0f0f0f0);
}
and the compiler line (gcc 4.4 - Intel Xeon L3426): 和编译器行(gcc 4.4 - Intel Xeon L3426):
gcc -msse4.2 poptest.c -o poptest
I do NOT get the builtin popcnt insruction rather the compiler generates a lookup table and computes the popcount that way. 我没有得到内置的popcnt insruction,而是编译器生成一个查找表并以这种方式计算popcount。 The resulting binary is over 8000 bytes. 生成的二进制文件超过8000个字节。 (Yuk!) (东旭!)
Thanks so much for any assistance. 非常感谢您的帮助。
You have to tell GCC to generate code for an architecture that supports the popcnt instruction: 您必须告诉GCC为支持popcnt指令的体系结构生成代码:
gcc -march=corei7 popcnt.c
Or just enable support for popcnt: 或者只是启用popcnt支持:
gcc -mpopcnt popcnt.c
In your example program the parameter to __builtin_popcountll
is a constant so the compiler will probably do the calculation at compile time and never emit the popcnt instruction. 在您的示例程序的参数__builtin_popcountll
是一个常数,因此编译器可能会做计算在编译的时候,从来没有发出指令POPCNT。 GCC does this even if not asked to optimize the program. 即使没有要求优化程序,GCC也会这样做。
So try passing it something that it can't know at compile time: 所以尝试传递它在编译时无法知道的东西:
int main (int argc, char** argv)
{
return __builtin_popcountll ((long long) argv);
}
$ gcc -march=corei7 -O popcnt.c && objdump -d a.out | grep '<main>' -A 2
0000000000400454 <main>:
400454: f3 48 0f b8 c6 popcnt %rsi,%rax
400459: c3 retq
You need to do it like this: 你需要这样做:
#include <stdio.h>
#include <smmintrin.h>
int main(void)
{
int pop = _mm_popcnt_u64(0xf0f0f0f0f0f0f0f0ULL);
printf("pop = %d\n", pop);
return 0;
}
$ gcc -Wall -m64 -msse4.2 popcnt.c -o popcnt
$ ./popcnt
pop = 32
$
EDIT 编辑
Oops - I just checked the disassembly output with gcc 4.2 and ICC 11.1 - while ICC 11.1 correctly generates popcntl
or popcntq
, for some reason gcc does not - it calls ___popcountdi2
instead. 糟糕 - 我刚用gcc 4.2和ICC 11.1检查了反汇编输出 - 而ICC 11.1正确生成了popcntl
或popcntq
,由于某些原因gcc没有 - 它调用了___popcountdi2
。 Weird. 奇怪的。 I will try a newer version of gcc when I get a chance and see if it's fixed. 我有机会尝试更新版本的gcc,看看它是否已修复。 I guess the only workaround otherwise is to use ICC instead of gcc. 我想唯一的解决方法是使用ICC而不是gcc。
For __builtin_popcountll
in GCC, all you need to do is add -mpopcnt
对于GCC中的__builtin_popcountll
,您需要做的就是添加-mpopcnt
#include <stdlib.h>
int main(int argc, char **argv) {
return __builtin_popcountll(atoi(argv[1]));
}
-mpopcnt
与-mpopcnt
$ otool -tvV a.out
a.out:
(__TEXT,__text) section
_main:
0000000100000f66 pushq %rbp
0000000100000f67 movq %rsp, %rbp
0000000100000f6a subq $0x10, %rsp
0000000100000f6e movq %rdi, -0x8(%rbp)
0000000100000f72 movq -0x8(%rbp), %rax
0000000100000f76 addq $0x8, %rax
0000000100000f7a movq (%rax), %rax
0000000100000f7d movq %rax, %rdi
0000000100000f80 callq 0x100000f8e ## symbol stub for: _atoi
0000000100000f85 cltq
0000000100000f87 popcntq %rax, %rax
0000000100000f8c leave
0000000100000f8d retq
-mpopcnt
没有-mpopcnt
a.out:
(__TEXT,__text) section
_main:
0000000100000f55 pushq %rbp
0000000100000f56 movq %rsp, %rbp
0000000100000f59 subq $0x10, %rsp
0000000100000f5d movq %rdi, -0x8(%rbp)
0000000100000f61 movq -0x8(%rbp), %rax
0000000100000f65 addq $0x8, %rax
0000000100000f69 movq (%rax), %rax
0000000100000f6c movq %rax, %rdi
0000000100000f6f callq 0x100000f86 ## symbol stub for: _atoi
0000000100000f74 cltq
0000000100000f76 movq %rax, %rdi
0000000100000f79 callq 0x100000f80 ## symbol stub for: ___popcountdi2
0000000100000f7e leave
0000000100000f7f retq
Be sure to check the ABM bit (bit 23) of CPUID feature bits before using POPCNTQ 在使用POPCNTQ之前,请务必检查CPUID功能位的ABM位(第23位)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.