简体   繁体   English

在__m128i向量上水平检查零?

[英]Check for zeros horizontally across __m128i vector?

I have several __m128i vectors containing 32-bit unsigned integers and I would like to check whether any of the 4 integers is a zero. 我有几个包含32位无符号整数的__m128i向量,我想检查4个整数是否为零。

I understand how I can "aggregate" the multiple __m128i vectors but eventually I will still end up with a single __m128i vector, which I will then need to check horizontally. 我了解如何“聚合”多个__m128i向量,但最终我仍然会得到一个__m128i向量,然后需要水平检查。

How do I perform the final horizontal check for zero across the last vector? 如何在最后一个向量上执行零的最终水平检查?

EDIT I am using Intel intrinsics, not inline assembly 编辑我正在使用英特尔内部函数,而不是内联汇编

Don't do it. 不要这样 Avoid horizontal operation whenever possible; 尽可能避免水平操作; it is death to performance of vector code. 矢量代码的性能大受打击。

Instead, compare the vector to a vector of zeros, then use PMOVMSKB to get a mask in GPR. 取而代之的是,将向量与零向量进行比较,然后使用PMOVMSKB获取GPR中的掩码。 If that mask is non-zero, at least one of the lanes of your vector was zero: 如果该掩码不为零,则向量的至少一个泳道为零:

__m128i yourVector;
__m128i zeroVector = _mm_set1_epi32(0);

if (_mm_movemask_epi8(_mm_cmpeq_epi32(yourVector,zeroVector))) {
    // at least one lane of your vector is zero.
}

You can also use PTEST if you want to assume SSE4.1. 如果您想使用SSE4.1,也可以使用PTEST。


Taking the question at face value, if you really did need to do a horizontal and for some reason, it would be movhlps + andps + shufps + andps. 以面值来考虑问题,如果您确实确实需要进行水平处理,并且由于某种原因,那就是movhlps + andps + shufps + andps。 But don't do that. 但是不要那样做。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM