简体   繁体   English

通过按位选择删除分支

[英]Removing branches via bitwise select

I was told that the branches in the code 我被告知代码中的分支

int value = //some number;
if(value > some_other_value)
   value *= 23;
else
   value -= 5; 

can be eliminated via bitwise masking (in order to enable SIMD optimization for the code): 可以通过按位掩码消除(为了启用代码的SIMD优化):

const int Mask = (some_other_value-value)>>31;
value =      ((value * 23)&Mask)|((value-5)&~Mask);

However, I do not understand how this works (even though I understand what operations are being used here and how the results will look in binary). 但是,我不明白这是如何工作的(尽管我理解这里使用的是什么操作以及结果在二进制中看起来如何)。 Furthermore, how generally applicable is this? 此外,这一般如何适用? What if the original code was instead something like 如果原始代码是相似的,那该怎么办?

if(value & 1 == 1)
   value *= 23;
else
   value -= 5;

Would the branch-removed code still be the same? 分支删除代码是否仍然相同? Otherwise, what is the purpose of the mask and how should I go about creating it? 否则,面具的目的是什么,我应该如何创建呢? What is happening here? 这里发生了什么?

This works: 这有效:

const int Mask = (some_other_value-value)>>31;
value =      ((value * 23)&Mask)|((value-5)&~Mask);

Mask becomes the sign bit of some_other_value - value - similar to: 掩码成为some_other_value - value的符号位some_other_value - value - 类似于:

if (value > some_other_value) mask = -1; else mask = 0; 

You could achieve the same thing with your second example, using: 你可以用第二个例子实现同样的目的,使用:

mask = -(value & 1);

So, -0 = 0, -1 = all ones. 所以,-0 = 0,-1 =全部。

Edit: I would also bear in mind that if the calculation gets too complicated, you are not gaining anything over the branching version, particularly not if the branches are reasonably predictable. 编辑:我还要记住,如果计算过于复杂,你就不会在分支版本上获得任何东西,特别是如果分支是合理可预测的。

This is a premature optimization in the best case, and an anti-optimization in the worst case. 在最好的情况下,这是过早的优化,在最坏的情况下是反优化。

If the code can be vectorized, it will use conditional moves anyway, since SIMD doesn't know anything else. 如果代码可以被矢量化,那么它将使用条件移动,因为SIMD不知道任何其他内容。

But even for scalar code, modern compilers usually generate conditional moves, so there is no branch (unless the compiler figures that evaluating both equations is sufficiently expensive so it's more efficient to branch). 但即使对于标量代码,现代编译器通常也会生成条件移动,因此没有分支(除非编译器认为评估这两个方程都足够昂贵,因此分支更有效)。

Conditional moves have been a standard feature on RISC processors (say, eg ARM) pretty much forever, and are supported even on x86 for about 17 years. 有条件的移动几乎是RISC处理器(例如ARM)的标准功能,甚至在x86上也支持了大约17年。 On a modern processor, a conditional move will take either exactly the same amount of cycles as a normal move, or maybe 2-3 cycles at most extra. 在现代处理器上,条件移动将采用与正常移动完全相同的周期数,或者最多可能需要2-3个周期。
This obviously assumes that the condition is evaluated early enough (though it does not matter if there is no dependency on the value, since out-of-order execution will hide it), but that's the case with any kind of cryptic optimization hack that you apply, too. 这显然假设条件得到足够早的评估(尽管如果不依赖于值并不重要,因为无序执行会隐藏它),但是任何类型的神秘优化黑客都是这样的情况申请也是。 You just can't use a result that isn't there yet. 你不能使用那里还没有的结果。

If you can help it, always write code that is comprehensible at first look , instead of some obfuscated 如果你可以提供帮助,请始终编写初看起来易于理解的代码,而不是一些模糊的代码

value = (((foo<<31)&bar, ++baz) -= (foo & 7121)) + PHASE_OF_MOON;

kind of stuff, which will not only be none faster and likely slower , but also confuse someone reviewing your code (including yourself, in 6-10 months from now!), is highly non-portable, and quite possibly also produce incorrect results in situations that you don't anticipate. 一种东西,它不仅不会更快,而且可能更慢 ,但也会让人混淆审查你的代码(包括你自己,从现在开始的6-10个月!),非常不便携,很可能也会产生不正确的结果您没有预料到的情况。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM