分支预测不起作用吗？

Question

In reference to this question, the answer specify that the unsorted array takes more time because it fails the branch prediction test. 在参考这个问题时，答案指出未排序的数组花费更多时间，因为它未通过分支预测测试。 but if we make a minor change in the program: 但如果我们对程序进行微小改动：

import java.util.Arrays;
import java.util.Random;


public class Main{

    public static void main(String[] args) {
        // Generate data
        int arraySize = 32768;
        int data[] = new int[arraySize];

        Random rnd = new Random(0);
        for (int c = 0; c < arraySize; ++c) {
            data[c] = rnd.nextInt() % 256;
        }

        // !!! With this, the next loop runs faster
        Arrays.sort(data);

        // Test
        long start = System.nanoTime();
        long sum = 0;

        for (int i = 0; i < 100000; ++i) {
            // Primary loop
            for (int c = 0; c < arraySize; ++c) {
                if (data[c] >= 128) {
                    sum = data[c];
                }
            }
        }

        System.out.println((System.nanoTime() - start) / 1000000000.0);
        System.out.println("sum = " + sum);
    }
}

here I have replaced (from original question) 在这里我已经更换（来自原始问题）

if (data[c] >= 128) 
    sum += data[c];

with 同

if (data[c] >= 128) 
    sum = data[c];

the unsorted array gives approx. 未排序的数组给出约。 the same result, I want to ask why branch prediction is not working in this case? 同样的结果，我想问为什么分支预测在这种情况下不起作用？

Answer 1

I have used jmh to analyze this. 我用jmh来分析这个。 Here is my code: 这是我的代码：

@OutputTimeUnit(TimeUnit.MICROSECONDS)
@BenchmarkMode(Mode.AverageTime)
@Warmup(iterations = 2, time = 1)
@Measurement(iterations = 3, time = 1)
@State(Scope.Thread)
@Fork(2)
public class Comparison
{
  static final int SIZE = 1<<15;
  final int[] data = new int[SIZE];

  @Setup
  public void setup() {
    int i = 1;
    for (int c = 0; c < SIZE; ++c) data[c] = (i*=611953);
    for (int c = 0; c < SIZE; ++c) data[c] = data[c] >= 128? 128 : 127;
  }

  @GenerateMicroBenchmark
  public long sum() {
    long sum = 0;
    for (int c = 0; c < SIZE; ++c) if (data[c] >= 128) sum += data[c];
    return sum;
  }
}

Notice I don't use either sorting or random number generation; 注意我不使用排序或随机数生成; they are an unnecessary complication. 它们是不必要的并发症。 With the formula used in the above code: 使用上面代码中使用的公式：

data[c] = (i*=611953);

I get 132 µs runtime. 我的运行时间为132μs。 If I comment out the line involving 如果我注释掉涉及的行

data[c] = data[c] >= 128? 128 : 127;

the time doesn't change at all. 时间不会改变。 This eliminates all arithmetic considerations and focuses on branch prediction. 这消除了所有算术考虑，并侧重于分支预测。 If I use 如果我使用

data[c] = 127;

I get 13 µs, and if I use 我得到13μs，如果我使用

data[c] = 128;

I get 16 µs. 我得到16μs。 That's the "base case", stressing the difference between constant branching decisions. 这是“基本案例”，强调不断分支决策之间的区别。

My conclusion: this is definitely the effect of low-level branch prediction. 我的结论：这绝对是低级分支预测的结果。

Could the JIT reverse the loop? JIT可以扭转循环吗？

Let's analyze your intervention now. 我们现在分析你的干预。 If I use the formula as presented in my code above, but change 如果我使用上面代码中提供的公式，但更改

if (data[c] >= 128) sum += data[c];

to 至

if (data[c] >= 128) sum = data[c];

then the timing indeed drops from 132 µs to 27 µs. 然后时间确实从132μs下降到27μs。

This is my guess at explaining the drop: an optimizing trick the JIT compiler can do is to reverse the direction of the loop . 这是我对解释下降的猜测：JIT编译器可以做的优化技巧是反转循环的方向 。 Now your code becomes 现在你的代码变成了

for (int c = SIZE-1; c <= 0; --c) if (data[c] >= 128) { sum = data[c]; break; }

the loop has been short-circuited to the minimum number of iterations needed to reach the same outcome as the original loop. 循环已经短路到达到与原始循环相同结果所需的最小迭代次数。

I added this 我加了这个

data[SIZE-1] = 128;

to the end of the setup() method, but it didn't change the timing. 到setup()方法结束，但它没有改变时间。 That would seem to invalidate the naïve version of the "loop reversal" conjecture. 这似乎使“循环逆转”猜想的天真版本无效。

No, it's probably `cmovl` 不，这可能是`cmovl`

In analyzing assembly I find this: 在分析程序集时，我发现：

cmp edx, 0x80
cmovl eax, ebx

cmovl is a conditional move instruction which will perform the effect of the assignment happening in the then branch, but without involving any jumps, therefore eliminating any penalty associated with branch prediction failure. cmovl是一个条件移动指令，它将执行在then分支中发生的赋值的效果，但不涉及任何跳转，因此消除了与分支预测失败相关的任何惩罚。 This is a good explanation of the actual effect. 这是对实际效果的一个很好的解释。

分支预测不起作用吗？

问题描述

1 个解决方案

解决方案1
10 已采纳 2014-01-29 14:38:49

Could the JIT reverse the loop? JIT可以扭转循环吗？

No, it's probably `cmovl` 不，这可能是`cmovl`

分支预测不起作用吗？

问题描述

1 个解决方案

解决方案1 10 已采纳 2014-01-29 14:38:49

Could the JIT reverse the loop? JIT可以扭转循环吗？

No, it's probably cmovl 不，这可能是cmovl

解决方案1
10 已采纳 2014-01-29 14:38:49

No, it's probably `cmovl` 不，这可能是`cmovl`