简体   繁体   English

如何生成一个sse4.2 popcnt机器指令

[英]How to generate a sse4.2 popcnt machine instruction

Using the c program: 使用c程序:

int main(int argc , char** argv)
{

  return  __builtin_popcountll(0xf0f0f0f0f0f0f0f0);

}

and the compiler line (gcc 4.4 - Intel Xeon L3426): 和编译器行(gcc 4.4 - Intel Xeon L3426):

gcc -msse4.2 poptest.c -o poptest

I do NOT get the builtin popcnt insruction rather the compiler generates a lookup table and computes the popcount that way. 我没有得到内置的popcnt insruction,而是编译器生成一个查找表并以这种方式计算popcount。 The resulting binary is over 8000 bytes. 生成的二进制文件超过8000个字节。 (Yuk!) (东旭!)

Thanks so much for any assistance. 非常感谢您的帮助。

You have to tell GCC to generate code for an architecture that supports the popcnt instruction: 您必须告诉GCC为支持popcnt指令的体系结构生成代码:

gcc -march=corei7 popcnt.c

Or just enable support for popcnt: 或者只是启用popcnt支持:

gcc -mpopcnt popcnt.c

In your example program the parameter to __builtin_popcountll is a constant so the compiler will probably do the calculation at compile time and never emit the popcnt instruction. 在您的示例程序的参数__builtin_popcountll是一个常数,因此编译器可能会做计算在编译的时候,从来没有发出指令POPCNT。 GCC does this even if not asked to optimize the program. 即使没有要求优化程序,GCC也会这样做。

So try passing it something that it can't know at compile time: 所以尝试传递它在编译时无法知道的东西:

int main (int argc, char** argv)
{
    return  __builtin_popcountll ((long long) argv);
}

$ gcc -march=corei7 -O popcnt.c && objdump -d a.out | grep '<main>' -A 2
0000000000400454 <main>:
  400454:       f3 48 0f b8 c6          popcnt %rsi,%rax
  400459:       c3                      retq

You need to do it like this: 你需要这样做:

#include <stdio.h>
#include <smmintrin.h>

int main(void)
{
    int pop = _mm_popcnt_u64(0xf0f0f0f0f0f0f0f0ULL);
    printf("pop = %d\n", pop);
    return 0;
}

$ gcc -Wall -m64 -msse4.2 popcnt.c -o popcnt
$ ./popcnt 
pop = 32
$ 

EDIT 编辑

Oops - I just checked the disassembly output with gcc 4.2 and ICC 11.1 - while ICC 11.1 correctly generates popcntl or popcntq , for some reason gcc does not - it calls ___popcountdi2 instead. 糟糕 - 我刚用gcc 4.2和ICC 11.1检查了反汇编输出 - 而ICC 11.1正确生成了popcntlpopcntq ,由于某些原因gcc没有 - 它调用了___popcountdi2 Weird. 奇怪的。 I will try a newer version of gcc when I get a chance and see if it's fixed. 我有机会尝试更新版本的gcc,看看它是否已修复。 I guess the only workaround otherwise is to use ICC instead of gcc. 我想唯一的解决方法是使用ICC而不是gcc。

For __builtin_popcountll in GCC, all you need to do is add -mpopcnt 对于GCC中的__builtin_popcountll ,您需要做的就是添加-mpopcnt

#include <stdlib.h>
int main(int argc, char **argv) {
    return __builtin_popcountll(atoi(argv[1]));
}

with -mpopcnt -mpopcnt

$ otool -tvV a.out
a.out:
(__TEXT,__text) section
_main:
0000000100000f66    pushq   %rbp
0000000100000f67    movq    %rsp, %rbp
0000000100000f6a    subq    $0x10, %rsp
0000000100000f6e    movq    %rdi, -0x8(%rbp)
0000000100000f72    movq    -0x8(%rbp), %rax
0000000100000f76    addq    $0x8, %rax
0000000100000f7a    movq    (%rax), %rax
0000000100000f7d    movq    %rax, %rdi
0000000100000f80    callq   0x100000f8e ## symbol stub for: _atoi
0000000100000f85    cltq
0000000100000f87    popcntq %rax, %rax
0000000100000f8c    leave
0000000100000f8d    retq

without -mpopcnt 没有-mpopcnt

a.out:
(__TEXT,__text) section
_main:
0000000100000f55    pushq   %rbp
0000000100000f56    movq    %rsp, %rbp
0000000100000f59    subq    $0x10, %rsp
0000000100000f5d    movq    %rdi, -0x8(%rbp)
0000000100000f61    movq    -0x8(%rbp), %rax
0000000100000f65    addq    $0x8, %rax
0000000100000f69    movq    (%rax), %rax
0000000100000f6c    movq    %rax, %rdi
0000000100000f6f    callq   0x100000f86 ## symbol stub for: _atoi
0000000100000f74    cltq
0000000100000f76    movq    %rax, %rdi
0000000100000f79    callq   0x100000f80 ## symbol stub for: ___popcountdi2
0000000100000f7e    leave
0000000100000f7f    retq

Notes 笔记

Be sure to check the ABM bit (bit 23) of CPUID feature bits before using POPCNTQ 在使用POPCNTQ之前,请务必检查CPUID功能位的ABM位(第23位)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM