简体   繁体   English

必须有一个非常快速的方法来计算这个按位表达式?

[英]There must be a really fast way to calculate this bitwise expression?

Let v and w be two bitstrings. 设v和w是两个位串。 In the current application they consist of 8 bits. 在当前的应用中,它们由8位组成。 I am looking for the fastest way to calculate the following expression. 我正在寻找计算以下表达式的最快方法。

x = (v[1] & w[0]) ^ (v[2] & w[1]) ^ (v[2] & w[0]) ^ (v[3] & w[2]) ^ (v[3]) & w[1]) ^ (v[3] & w[0]) ^ ...

Some ideas on the subject: one thing I noticed is that this expression can also be written as below. 关于这个主题的一些想法:我注意到的一件事是这个表达式也可以写成如下。 Let

P(w[k]) = w[k] ^ w[k-1] ^ ... ^ w[0]

denote the parity of the lowest k + 1 bits of w. 表示w的最低k + 1位的奇偶校验。 Then 然后

x = (v[1] & P(w[0])) ^ (v[2] & P(w[1])) ^ (v[3] & P(w[2])) ^ ... ^ (v[7] & P(w[6]))

Now if Pw is a bitstring in which each bit denotes the parity of the lower bits, ie for which Pw[i] = P(w[i-1]) then x could be written as follows: 现在,如果Pw是一个位串,其中每个位表示低位的奇偶校验,即Pw[i] = P(w[i-1])那么x可以写成如下:

x = P(v & Pw)

Now, on http://graphics.stanford.edu/~seander/bithacks.html I found a quick way to calculate the parity of a string, but in order to build a fast algorithm based on this, I would also need a fast way to calculate the bitstring Pw described above. 现在,在http://graphics.stanford.edu/~seander/bithacks.html上我找到了一种快速计算字符串奇偶校验的方法,但为了构建一个基于此的快速算法,我还需要一个快速的计算上述位串Pw

Or maybe I'm going about this in the wrong way completely, there are an awful lot of parity calculations to be done this way. 或许我完全以错误的方式解决这个问题,有很多奇偶校验计算要做到这一点。 If this is indeed the way to go, I was wondering if it would be possible (assuming the program will run on x86) to use the parity flag in assembly to speed up the calculation. 如果这确实是要走的路,我想知道是否有可能(假设程序将在x86上运行)在程序集中使用奇偶校验标志来加速计算。

Finally, this would be a calculation that would be needed a LOT in the application I am developing, so speed is really of the seence. 最后,这将是我正在开发的应用程序中需要很多的计算,因此速度确实是真正的。 I was wondering if it would be possible to do the entire calculation within a register and if this could be faster than creating a lookup table in memory. 我想知道是否可以在寄存器中进行整个计算,如果这可能比在内存中创建查找表更快。

If v and w are truly 8 bits, then you could just precalculate all 256^2 combinations and store the result in a table of 65K bytes. 如果v和w真的是8位,那么你可以预先计算所有256 ^ 2组合并将结果存储在65K字节的表中。 That will easily fit into a cache. 这很容易适应缓存。 Your computation then becomes: 然后你的计算成为:

  precomputed[v<<8+w]

which is a few machine clocks and a hot cache line lookup. 这是一些机器时钟和热缓存行查找。 Might be hard to beat. 可能很难被击败。

On x86 the parity bit is automatically calculated for low 8-bit arithmetic operations. 在x86上,自动计算奇偶校验位以进行低8位算术运算。 Basically the required operations are reduced to: 基本上所需的操作减少到:

 Pw = Lookup_256[w];
 v &= Pw;                 // get the Parity as side effect on x86, or

 v  = Lookup_256[v] >> 7; // Reuse the table to get parity for bit 7

EDIT 编辑

Higher level optimizations and parallel implementation is achievable by recognizing that the partial products (v[i] & w[j]) are internal part of multiplication and that the concatenation with the operator ^ makes this overall operation carry-less (or polynomial). 通过识别部分乘积(v [i]&w [j])是乘法的内部部分并且与运算符^的连接使得该整体操作无进位(或多项式),可以实现更高级别的优化和并行实现。

The overall operation would be Parity( ((v >> 1) Px w) & 0xff), where Px denotes polynomial multiplication, which is supported in eg NEON and in intel architecture with PCLMULQDQ instruction. 整体操作将是奇偶校验(((v >> 1)Px w)&0xff),其中Px表示多项式乘法,其在例如NEON和具有PCLMULQDQ指令的英特尔架构中受支持。 The Intel instruction (unfortunately) operates in 64-bit words, making it probably possible, but difficult to incorporate several independent vectors v,w to be multiplied simultaneously. 英特尔指令(不幸的是)以64位字运行,这可能是可能的,但很难将几个独立的向量v,w同时相乘。

Something like this, perhaps? 也许是这样的事情?

register int v, w, parity=0;
/* ... */
v >>= 1; /* Discard lsb? */
while (v) {
  parity ^= v ^ w;
  w = (w & 1) ^ (w >> 1);
  v >>= 1;
}
parity &= 1;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM