[英]Check for zeros horizontally across __m128i vector?
I have several __m128i
vectors containing 32-bit unsigned integers and I would like to check whether any of the 4 integers is a zero. 我有几个包含32位无符号整数的__m128i
向量,我想检查4个整数是否为零。
I understand how I can "aggregate" the multiple __m128i
vectors but eventually I will still end up with a single __m128i
vector, which I will then need to check horizontally. 我了解如何“聚合”多个__m128i
向量,但最终我仍然会得到一个__m128i
向量,然后需要水平检查。
How do I perform the final horizontal check for zero across the last vector? 如何在最后一个向量上执行零的最终水平检查?
EDIT I am using Intel intrinsics, not inline assembly 编辑我正在使用英特尔内部函数,而不是内联汇编
Don't do it. 不要这样 Avoid horizontal operation whenever possible; 尽可能避免水平操作; it is death to performance of vector code. 矢量代码的性能大受打击。
Instead, compare the vector to a vector of zeros, then use PMOVMSKB to get a mask in GPR. 取而代之的是,将向量与零向量进行比较,然后使用PMOVMSKB获取GPR中的掩码。 If that mask is non-zero, at least one of the lanes of your vector was zero: 如果该掩码不为零,则向量的至少一个泳道为零:
__m128i yourVector;
__m128i zeroVector = _mm_set1_epi32(0);
if (_mm_movemask_epi8(_mm_cmpeq_epi32(yourVector,zeroVector))) {
// at least one lane of your vector is zero.
}
You can also use PTEST if you want to assume SSE4.1. 如果您想使用SSE4.1,也可以使用PTEST。
Taking the question at face value, if you really did need to do a horizontal and for some reason, it would be movhlps + andps + shufps + andps. 以面值来考虑问题,如果您确实确实需要进行水平处理,并且由于某种原因,那就是movhlps + andps + shufps + andps。 But don't do that. 但是不要那样做。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.