简体   繁体   English

比较数字的按位运算?

[英]Bitwise operations for comparing numbers?

I've spent too many brain cycles on this over the last day.在过去的一天里,我为此花费了太多的大脑周期。

I'm trying to come up with a set of bitwise operations that may re-implement the following condition:我正在尝试提出一组可能重新实现以下条件的按位运算:

uint8_t a, b;
uint8_t c, d;
uint8_t e, f;
...

bool result = (a == 0xff || a == b) && (c == 0xff || c == d) && (e == 0xff || e == f);

Code I'm looking at has four of these expressions, short-circuit && ed together (as above).我正在查看的代码有四个这样的表达式,将&&短路在一起(如上所述)。

I know this is an esoteric question, but the short-circuit nature of this and the timing of the above code in a tight loop makes the lack of predictable time a royal pain, and quite frankly, it seems to really suck on architectures where branch prediction isn't available, or so well implemented.我知道这是一个深奥的问题,但是它的短路性质和上述代码在紧密循环中的时间安排使得缺乏可预测的时间成为一种痛苦,坦率地说,它似乎真的很讨厌分支的架构预测不可用,或者执行得很好。

Is there such a beast that would be concise?有这么简洁的野兽吗?

So, if you really want to do bit-twiddling to make this "fast" (which you really should only do after profiling your code to make sure this is a bottleneck), what you want to do is vectorize this by packing all the values together into a wider word so you can do all the comparisons at once (one instruction), and then extract the answer from a few bits.因此,如果您真的想进行位旋转以使其“快速”(您真的应该只在分析代码以确保这是一个瓶颈之后才应该这样做),您想要做的是通过打包所有值来向量化它一起变成一个更宽的词,这样您就可以一次进行所有比较(一条指令),然后从几位中提取答案。

There are a few tricks to this.这有一些技巧。 To compare two value for equality, you can xor (^) them and test to see if the result is zero.要比较两个值是否相等,您可以对它们进行异或 (^) 并测试结果是否为零。 To test a field of a wider word to see if it is zero, you can 'pack' it with a 1 bit above, then subtract one and see if the extra bit you added is still 1 -- if it is now 0, the value of the field was zero.要测试一个更宽单词的字段以查看它是否为零,您可以将其与上面的 1 位“打包”,然后减去 1 并查看您添加的额外位是否仍为 1——如果它现在为 0,则该字段的值为零。

Putting all this together, you want to do 6 8-bit compares at once.将所有这些放在一起,您想一次进行 6 次 8 位比较。 You can pack these values into 9 bit fields in a 64-bit word (9 bits to get that extra 1 guard bit your going to test for subtraction).您可以将这些值打包到 64 位字中的 9 位字段中(9 位以获得额外的 1 个保护位,您将要测试减法)。 You can fit up to 7 such 9 bit fields in a 64 bit int, so no problem您最多可以在 64 位整数中容纳 7 个这样的 9 位字段,所以没问题

// pack 6 9-bit values into a word
#define VEC6x9(A,B,C,D,E,F)  (((uint64_t)(A) << 45) | ((uint64_t)(B) << 36) | ((uint64_t)(C) << 27) | ((uint64_t)(D) << 18) | ((uint64_t)(E) << 9) | (uint64_t)(F))

// the two values to compare
uint64_t v1 = VEC6x9(a, a, c, c, e, e);
uint64_t v2 = VEC6x9(b, 0xff, d, 0xff, f, 0xff);
uint64_t guard_bits = VEC6x9(0x100, 0x100, 0x100, 0x100, 0x100, 0x100);
uint64_t ones = VEC6x9(1, 1, 1, 1, 1, 1);
uint64_t alt_guard_bits = VEC6x9(0, 0x100, 0, 0x100, 0, 0x100);

// do the comparisons in parallel
uint64_t res_vec = ((v1 ^ v2) | guard_bits) - ones;

// mask off the bits we'll ignore (optional for clarity, not needed for correctness)
res_vec &= ~guard_bits;

// do the 3 OR ops in parallel
res_vec &= res_vec >> 9;

// get the result
bool result = (res_vec & alt_guard_bits) == 0;

The ORs and ANDs at the end are 'backwards' becuase the result bit for each comparison is 0 if the comparison was true (values were equal) and 1 if it was false (values were not equal.)最后的 OR 和 AND 是“向后”的,因为如果比较为真(值相等),每个比较的结果位为 0,如果比较为假(值不相等),则为 1。

All of the above is mostly of interest if you are writing a compiler -- its how you end up implementing a vector comparison -- and it may well be the case that a vectorizing compiler will do it all for you automatically.如果您正在编写一个编译器,那么上述所有内容都是最重要的——它最终是如何实现向量比较的——而且向量化编译器很可能会自动为您完成所有这些工作。

This can be much more efficient if you can arrange to have your initial values pre-packed into vectors.如果您可以安排将初始值预先打包到向量中,这会更有效。 This may in turn influence your choice of data structures and allowable values -- if you arrange for your values to be 7-bit or 15-bit (instead of 8-bit) they may pack nicer when you add the guard bits...这反过来可能会影响您对数据结构和允许值的选择——如果您将值设置为 7 位或 15 位(而不是 8 位),当您添加保护位时,它们可能会包装得更好......

You could modify how you store and interpret the data:您可以修改存储和解释数据的方式:

When a if 0xFF, do you need the value of b .a if 0xFF 时,是否需要b的值。 If not, then make b equal to 0xFF and simplify the expression by removing the part that test for 0xFF .如果不是,则使b等于0xFF并通过删除测试0xFF的部分来简化表达式。

Also, you might combine a , b and c in a single variable.此外,您可以将abc在一个变量中。

uint32_t abc;
uint32_t def;

bool result = abc == def;

Other operations might be slower but that loop should be much faster (single comparison instead of up to 6 comparisons).其他操作可能会更慢,但该循环应该更快(单次比较而不是最多 6 次比较)。

You might want to use an union to be able to access byte individually or in group.您可能希望使用联合来单独或分组访问字节。 In that case, make sure that the forth byte is always 0.在这种情况下,请确保第四个字节始终为 0。

To remove timing variations with &&, ||要使用&&, || , use &, | , 使用&, | . . @molbdnilo . @molbdnilo Possible faster, maybe not.可能更快,也可能不会。 Certainly easier to parallel.当然更容易并行。

// bool result = (a == 0xff || a == b) && (c == 0xff || c == d) 
//     && (e == 0xff || e == f);
bool result = ((a == 0xff) | (a == b)) & ((c == 0xff) | (c == d))
    & ((e == 0xff) | (e == f));

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM