[英]How to disable vectorization in clang++?
考虑以下小型搜索功能:
template <uint32_t N>
int32_t countsearch(const uint32_t *base, uint32_t needle) {
uint32_t count = 0;
#pragma clang loop vectorize(disable)
for (const uint32_t *probe = base; probe < base + N; probe++) {
if (*probe < needle)
count++;
}
return count;
}
例如,在-O2
或更高时, clang将此搜索矢量化 。 结果是这样的代码(针对10个元素):
int countsearch<10u>(unsigned int const*, unsigned int): # @int countsearch<10u>(unsigned int const*, unsigned int)
vmovd xmm0, esi
vpbroadcastd ymm0, xmm0
vpbroadcastd ymm1, dword ptr [rip + .LCPI0_0] # ymm1 = [2147483648,2147483648,2147483648,2147483648,2147483648,2147483648,2147483648,2147483648]
vpxor ymm2, ymm1, ymmword ptr [rdi]
vpxor ymm0, ymm0, ymm1
vpcmpgtd ymm0, ymm0, ymm2
cmp dword ptr [rdi + 32], esi
vpsrld ymm1, ymm0, 31
vextracti128 xmm1, ymm1, 1
vpsubd ymm0, ymm1, ymm0
vpshufd xmm1, xmm0, 78 # xmm1 = xmm0[2,3,0,1]
vpaddd ymm0, ymm0, ymm1
vphaddd ymm0, ymm0, ymm0
vmovd eax, xmm0
adc eax, 0
cmp dword ptr [rdi + 36], esi
adc eax, 0
vzeroupper
ret
如何在命令行上禁用此矢量化或在代码中使用#pragma
?
我尝试了以下命令行参数,但都没有阻止矢量化:
-disable-loop-vectorization
-disable-vectorization
-fno-vectorize
-fno-tree-vectorize
我也尝试在循环上方的#pragma clang loop vectorize(disable)
,如您在上面的代码中看到的那样,没有运气。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.