简体   繁体   English

测试 xmm 寄存器中的任何字节是否为 0

[英]Test if any byte in an xmm register is 0

I am currently teaching myself SIMD and am writing a rather simple String processing subroutine.我目前正在自学 SIMD,并且正在编写一个相当简单的字符串处理子程序。 I am however restricted to SSE2 , which makes me unable to utilize ptest to find the null terminal.但是,我仅限于 SSE2 ,这使我无法利用 ptest 找到 null 终端。

The way I am currently trying to find the null terminal makes my SIMD loop have >16 instructions, which kind of defeats the purpose of using SIMD - or atleast makes it not as worthwhile as it could be.我目前试图找到 null 终端的方式使我的 SIMD 循环有 >16 条指令,这违背了使用 SIMD 的目的——或者至少使它不值得。

//Check for null byte
pxor xmm4, xmm4
pcmpeqb xmm4, [rdi]                                   //Generate bitmask
movq rax, xmm4
test rax, 0xffffffffffffffff                          //Test low qword
jnz .Lepilogue
movhlps xmm4, xmm4                                    //Move high into low qword
movq rax, xmm4
test rax, 0xffffffffffffffff                          //Test high qword
jz .LsimdLoop                                         //No terminal was found, keep looping

I was wondering if there is any faster way to do this without ptest or whether this is the best it is gonna get and I'll have to just optimize the rest of the loop some more.我想知道在没有 ptest 的情况下是否有更快的方法来做到这一点,或者这是否是最好的方法,我将不得不进一步优化循环的 rest。

Note: I am ensuring that the String address for which the loop using SIMD is entered is 16B aligned to allow for aligned instructions.注意:我确保输入使用 SIMD 的循环的字符串地址是 16B 对齐的,以允许对齐指令。

You can use _mm_movemask_epi8 ( pmovmskb instruction) to obtain a bit mask from the result of comparison (the resulting mask contains the most significant bits of each byte in the vector).您可以使用_mm_movemask_epi8pmovmskb指令)从比较结果中获取位掩码(生成的掩码包含向量中每个字节的最高有效位)。 Then, testing for whether any of the bytes are zero means testing if any of the 16 bits in the mask are non-zero.然后,测试任何字节是否为零意味着测试掩码中的任何 16 位是否非零。

pxor xmm4, xmm4
pcmpeqb xmm4, [rdi]
pmovmskb eax, xmm4
test eax, eax          ; ZF=0 if there are any set bits = any matches
jnz .found_a_zero

After finding a vector with any matches, you can find the first match position with bsf eax,eax to get the bit-index in the bitmask, which is also the byte index in the 16-byte vector.找到任何匹配的向量后,可以找到第一个匹配的positionbsf eax,eax得到位掩码中的位索引,也就是 16 字节向量中的字节索引。

Alternatively, you can check for all bytes matching (eg like you'd do in memcmp / strcmp) with pcmpeqb / pmovmskb / cmp eax, 0xffff to check that all bits are set, instead of checking for at least 1 bit set.或者,您可以使用pcmpeqb / pmovmskb / cmp eax, 0xffff检查所有字节匹配(例如,就像您在 memcmp / strcmp 中所做的那样)以检查所有位是否已设置,而不是检查至少 1 位已设置。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM