__builtin_popcountll 和_mm_popcnt_u64 有什么区别？

Question

I was trying to how many 1 in 512MB memory and I found two possible methods, _mm_popcnt_u64() and __builtin_popcountll() in the gcc builtins.我试图在 512MB 内存中有多少个 1，我在gcc内置_mm_popcnt_u64()找到了两种可能的方法， _mm_popcnt_u64()和__builtin_popcountll() 。

_mm_popcnt_u64() is said to use the CPU introduction SSE4.2，which seems to be the fastest, and __builtin_popcountll() is excepted to use table lookup. _mm_popcnt_u64()被说成使用CPU引入SSE4.2，这似乎是最快的，并且__builtin_popcountll()被除外，以使用表查找。

So, I think __builtin_popcountll() should be little slower than _mm_popcnt_u64() .所以，我认为__builtin_popcountll()应该比_mm_popcnt_u64()慢一点。

However I got a result like this:但是我得到了这样的结果：

It took almost the same time for two methods.两种方法花费的时间几乎相同。 I highly doubt that they used the same way to work.我非常怀疑他们使用相同的工作方式。

I also got this in popcntintrin.h我也在popcntintrin.h得到了这个

/* Calculate a number of bits set to 1. */
extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial___))
_mm_popcnt_u32 (unsigned int __X)
{
  return __builtin_popcount (__X);
}

#ifdef __x86_64__
extern __inline long long __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_popcnt_u64 (unsigned long long __X)
{
  return __builtin_popcountll (__X);
}
#endif

So, I'm confused how __builtin_popcountll() works on earth所以，我很困惑__builtin_popcountll()在地球上是如何工作的

Answer 1

_mm_popcnt_u64 is part of <nmmintrin.h> , a header devised by Intel for utility functions for accessing SSE 4.2 instructions. _mm_popcnt_u64是<nmmintrin.h>一部分，这是英特尔为访问 SSE 4.2 指令的实用函数设计的头文件。

__builtin_popcountll is a GCC extension. __builtin_popcountll是 GCC 扩展。

_mm_popcnt_u64 is portable to non-GNU compilers, and __builtin_popcountll is portable to non-SSE-4.2 CPUs. _mm_popcnt_u64可移植到非 GNU 编译器，而__builtin_popcountll可移植到非 SSE-4.2 CPU。 But on systems where both are available, both should compile to the exact same code.但是在两者都可用的系统上，两者都应该编译为完全相同的代码。

Answer 2

If You compile without march flag, so with x86_64 default, builtin should be slower because it needs to dispatch function selecting between different architectures.如果你编译时没有进行 March 标志，那么在 x86_64 默认情况下，builtin 应该会更慢，因为它需要在不同的体系结构之间分派函数选择。 This will cause no inlining and additional condition.这将导致没有内联和附加条件。

__builtin_popcountll 和_mm_popcnt_u64 有什么区别？

问题描述

2 个解决方案

解决方案1
15

解决方案2
1 2019-11-18 20:22:23

__builtin_popcountll 和_mm_popcnt_u64 有什么区别？

问题描述

2 个解决方案

解决方案1 15

解决方案2 1 2019-11-18 20:22:23

解决方案1
15

解决方案2
1 2019-11-18 20:22:23