简体   繁体   English

如何制作无网段代码?

[英]How can I make branchless code?

Related to this answer: https://stackoverflow.com/a/11227902/4714970 与此答案相关: https//stackoverflow.com/a/11227902/4714970

In the above answer, it's mentioned how you can avoid branch prediction fails by avoiding branches. 在上面的答案中,提到了如何通过避免分支来避免分支预测失败。

The user demonstrates this by replacing: 用户通过替换以下内容来演示:

if (data[c] >= 128)
{
    sum += data[c];
}

With: 附:

int t = (data[c] - 128) >> 31;
sum += ~t & data[c];

How are these two equivalent (for the specific data set, not strictly equivalent)? 这两个是如何等效的(对于特定的数据集,不是严格等同的)?

What are some general ways I can do similar things in similar situations? 在类似的情况下,我可以采取哪些一般方法来做类似的事情? Would it always be by using >> and ~ ? 它总是通过使用>>~

int t = (data[c] - 128) >> 31;

The trick here is that if data[c] >= 128 , then data[c] - 128 is nonnegative, otherwise it is negative. 这里的技巧是,如果data[c] >= 128 ,那么data[c] - 128是非负的,否则它是负的。 The highest bit in an int , the sign bit, is 1 if and only if that number is negative. 当且仅当该数字为负时, int的最高位(符号位)为1。 >> is a shift that extends the sign bit, so shifting right by 31 makes the whole result 0 if it used to be nonnegative, and all 1 bits (which represents -1) if it used to be negative. >>是一个扩展符号位的移位,因此右移31会使整个结果为0(如果它曾经是非负的),而所有1位(代表-1)如果它曾经是负数。 So t is 0 if data[c] >= 128 , and -1 otherwise. 因此,如果data[c] >= 128t0 ,否则为-1 ~t switches these possibilities, so ~t is -1 if data[c] >= 128 , and 0 otherwise. ~t切换这些可能性,因此如果data[c] >= 128~t-1 ,否则为0

x & (-1) is always equal to x , and x & 0 is always equal to 0 . x & (-1)总是等于xx & 0总是等于0 So sum += ~t & data[c] increases sum by 0 if data[c] < 128 , and by data[c] otherwise. 因此,如果data[c] < 128 ,则sum += ~t & data[c]sum0 ,否则加上data[c]

Many of these tricks can be applied elsewhere. 其中许多技巧可以应用于其他地方。 This trick can certainly be generally applied to have a number be 0 if and only if one value is greater than or equal to another value, and -1 otherwise, and you can mess with it some more to get <= , < , and so on. 当且仅当一个值大于或等于另一个值时,这个技巧当然可以应用于数字为0 ,否则为-1 ,你可以更多地使用它来获得<=< ,等等上。 Bit twiddling like this is a common approach to making mathematical operations branch-free, though it's certainly not always going to be built out of the same operations; 这样的比特是一种使数学运算无分支的常用方法,尽管它肯定不会总是用相同的操作构建; ^ (xor) and | ^ (xor)和| (or) also come into play sometimes. (或)有时也会发挥作用。

While Louis Wasserman's answer is correct, I want to show you a more general (and much clearer) way to write branchless code. 虽然Louis Wasserman的回答是正确的,但我想向您展示一种更通用(更清晰)的方法来编写无分支代码。 You can just use ? : 你可以用? : ? : operator: ? :运营商:

    int t = data[c];
    sum += (t >= 128 ? t : 0);

JIT compiler sees from the execution profile that the condition is poorly predicted here. JIT编译器从执行配置文件中看到这里的条件预测不佳。 In such cases the compiler is smart enough to replace a conditional branch with a conditional move instruction: 在这种情况下,编译器足够聪明,可以用条件移动指令替换条件分支:

    mov    0x10(%r14,%rbp,4),%r9d  ; load R9d from array
    cmp    $0x80,%r9d              ; compare with 128
    cmovl  %r8d,%r9d               ; if less, move R8d (which is 0) to R9d

You can verify yourself that this version works equally fast for both sorted and unsorted array. 您可以验证此版本对已排序和未排序的数组的运行速度同样快。

Branchless code means typically evaluating all possible outcomes of a conditional statement with a weight from the set [0, 1], so that the Sum{ weight_i } = 1. Most of the calculations are essentially discarded. 无分支代码通常意味着使用集合[0,1]中的权重来评估条件语句的所有可能结果,以便Sum {weight_i} = 1.大多数计算基本上被丢弃。 Some optimization can result from the fact, that E_i doesn't have to be correct when the corresponding weight w_i (or mask m_i ) is zero. 一些优化可以由以下事实导致:当对应的权重w_i (或掩码m_i )为零时, E_i不必是正确的。

  result = (w_0 * E_0) + (w_1 * E_1) + ... + (w_n * E_n)    ;; or
  result = (m_0 & E_0) | (m_1 & E_1) | ... | (m_n * E_n)

where m_i stands for a bitmask. 其中m_i代表位掩码。

Speed can be achieved also through parallel processing of E_i with a horizontal collapse. 通过水平折叠并行处理E_i也可以实现速度。

This is contradictory to the semantics of if (a) b; else c; 这与if (a) b; else c;的语义相矛盾if (a) b; else c; if (a) b; else c; or it's ternary shorthand a ? b : c 还是它的三元速记a ? b : c a ? b : c , where only one expression out of [b, c] is evaluated. a ? b : c ,其中仅评估[b,c]中的一个表达式。

Thus ternary operation is no magic bullet for branchless code. 因此,三元运算对于无分支代码来说不是神奇的子弹。 A decent compiler produces branchless code equally for 一个体面的编译器同样产生无分支代码

t = data[n];
if (t >= 128) sum+=t;

vs.

    movl    -4(%rdi,%rdx), %ecx
    leal    (%rax,%rcx), %esi
    addl    $-128, %ecx
    cmovge  %esi, %eax

Variations of branchless code include presenting the problem through other branchless non-linear functions, such as ABS, if present in the target machine. 无分支代码的变化包括通过其他无分支非线性函数(例如ABS)呈现问题(如果存在于目标机器中)。

eg 例如

 2 * min(a,b) = a + b - ABS(a - b),
 2 * max(a,b) = a + b + ABS(a - b)

or even: 甚至:

 ABS(x) = sqrt(x*x)      ;; caveat -- this is "probably" not efficient

In addition to << and ~ , it may be equally beneficial to use bool and !bool instead of (possibly undefined) (int >> 31). 除了<<~ ,使用bool!bool代替(可能是未定义的)(int >> 31)可能同样有益。 Likewise, if the condition evaluates as [0, 1], one can generate a working mask with negation: 同样,如果条件的计算结果为[0,1],则可以生成带有否定的工作掩码:

-[0, 1] = [0, 0xffffffff]  in 2's complement representation

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM