[英]How to efficiently interleave bits from 8 __int16 numbers?
I am building Morton number for spatial indexing, I have 8 unsigned 16 bit numbers that will turn into __int128 number. 我正在构建用于空间索引的Morton数,我有8个无符号16位数,它们将变成__int128数。 The efficiency is crucial, so naive solution (loop over everything) or building separate 8 128bit numbers is too expensive.
效率是至关重要的,所以天真的解决方案(循环所有)或构建单独的8 128位数字太昂贵了。
I am using GCC, the target machine is 64 bits but without BMI2 support. 我正在使用GCC,目标机器是64位但没有BMI2支持。
How can I speed up the computation? 如何加快计算速度?
If your machine is x86 and supports SSE2, there is a clever answer using movmsk
instructions. 如果你的机器是x86并且支持SSE2,那么使用
movmsk
指令有一个聪明的答案。 Google SSE2 bit matrix transpose
for full code. Google
SSE2 bit matrix transpose
为完整代码。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.