简体   繁体   English

分支预测不起作用吗?

[英]Is branch prediction not working?

In reference to this question, the answer specify that the unsorted array takes more time because it fails the branch prediction test. 在参考这个问题时,答案指出未排序的数组花费更多时间,因为它未通过分支预测测试。 but if we make a minor change in the program: 但如果我们对程序进行微小改动:

import java.util.Arrays;
import java.util.Random;


public class Main{

    public static void main(String[] args) {
        // Generate data
        int arraySize = 32768;
        int data[] = new int[arraySize];

        Random rnd = new Random(0);
        for (int c = 0; c < arraySize; ++c) {
            data[c] = rnd.nextInt() % 256;
        }

        // !!! With this, the next loop runs faster
        Arrays.sort(data);

        // Test
        long start = System.nanoTime();
        long sum = 0;

        for (int i = 0; i < 100000; ++i) {
            // Primary loop
            for (int c = 0; c < arraySize; ++c) {
                if (data[c] >= 128) {
                    sum = data[c];
                }
            }
        }

        System.out.println((System.nanoTime() - start) / 1000000000.0);
        System.out.println("sum = " + sum);
    }
}

here I have replaced (from original question) 在这里我已经更换(来自原始问题)

if (data[c] >= 128) 
    sum += data[c];

with

if (data[c] >= 128) 
    sum = data[c];

the unsorted array gives approx. 未排序的数组给出约。 the same result, I want to ask why branch prediction is not working in this case? 同样的结果,我想问为什么分支预测在这种情况下不起作用?

I have used jmh to analyze this. 我用jmh来分析这个。 Here is my code: 这是我的代码:

@OutputTimeUnit(TimeUnit.MICROSECONDS)
@BenchmarkMode(Mode.AverageTime)
@Warmup(iterations = 2, time = 1)
@Measurement(iterations = 3, time = 1)
@State(Scope.Thread)
@Fork(2)
public class Comparison
{
  static final int SIZE = 1<<15;
  final int[] data = new int[SIZE];

  @Setup
  public void setup() {
    int i = 1;
    for (int c = 0; c < SIZE; ++c) data[c] = (i*=611953);
    for (int c = 0; c < SIZE; ++c) data[c] = data[c] >= 128? 128 : 127;
  }

  @GenerateMicroBenchmark
  public long sum() {
    long sum = 0;
    for (int c = 0; c < SIZE; ++c) if (data[c] >= 128) sum += data[c];
    return sum;
  }
}

Notice I don't use either sorting or random number generation; 注意我不使用排序或随机数生成; they are an unnecessary complication. 它们是不必要的并发症。 With the formula used in the above code: 使用上面代码中使用的公式:

data[c] = (i*=611953);

I get 132 µs runtime. 我的运行时间为132μs。 If I comment out the line involving 如果我注释掉涉及的行

data[c] = data[c] >= 128? 128 : 127;

the time doesn't change at all. 时间不会改变。 This eliminates all arithmetic considerations and focuses on branch prediction. 这消除了所有算术考虑,并侧重于分支预测。 If I use 如果我使用

data[c] = 127;

I get 13 µs, and if I use 我得到13μs,如果我使用

data[c] = 128;

I get 16 µs. 我得到16μs。 That's the "base case", stressing the difference between constant branching decisions. 这是“基本案例”,强调不断分支决策之间的区别。

My conclusion: this is definitely the effect of low-level branch prediction. 我的结论:这绝对是低级分支预测的结果。

Could the JIT reverse the loop? JIT可以扭转循环吗?

Let's analyze your intervention now. 我们现在分析你的干预。 If I use the formula as presented in my code above, but change 如果我使用上面代码中提供的公式,但更改

if (data[c] >= 128) sum += data[c];

to

if (data[c] >= 128) sum = data[c];

then the timing indeed drops from 132 µs to 27 µs. 然后时间确实从132μs下降到27μs。

This is my guess at explaining the drop: an optimizing trick the JIT compiler can do is to reverse the direction of the loop . 这是我对解释下降的猜测:JIT编译器可以做的优化技巧是反转循环的方向 Now your code becomes 现在你的代码变成了

for (int c = SIZE-1; c <= 0; --c) if (data[c] >= 128) { sum = data[c]; break; }

the loop has been short-circuited to the minimum number of iterations needed to reach the same outcome as the original loop. 循环已经短路到达到与原始循环相同结果所需的最小迭代次数。

I added this 我加了这个

data[SIZE-1] = 128;

to the end of the setup() method, but it didn't change the timing. setup()方法结束,但它没有改变时间。 That would seem to invalidate the naïve version of the "loop reversal" conjecture. 这似乎使“循环逆转”猜想的天真版本无效。

No, it's probably cmovl 不,这可能是cmovl

In analyzing assembly I find this: 在分析程序集时,我发现:

cmp edx, 0x80
cmovl eax, ebx

cmovl is a conditional move instruction which will perform the effect of the assignment happening in the then branch, but without involving any jumps, therefore eliminating any penalty associated with branch prediction failure. cmovl是一个条件移动指令,它将执行在then分支中发生的赋值的效果,但不涉及任何跳转,因此消除了与分支预测失败相关的任何惩罚。 This is a good explanation of the actual effect. 这是对实际效果的一个很好的解释。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Java for循环中的分支预测 - Branch prediction in a java for loop 是否有可能帮助分支预测? - Is it possible to aid branch prediction? 为什么分支预测比没有分支更快? - Why is branch prediction faster than no branch at all? 如何在 Java 中分析分支预测命中率 - How to profile branch prediction hitrate in Java 分支预测:避免简单操作的“else”分支是否会使代码更快(Java 示例)? - Branch prediction: Does avoiding “else” branch for simple operations makes code faster (Java example)? JVM 有它的运行时分支预测吗? 如果是这样,Java 代码中是否有提示分支结果的方法? - Does JVM has its runtime branch prediction? If so, is there a way in Java code to hint branch results? Branch And Bound在OptaPlanner中不起作用 - Branch And Bound not working in OptaPlanner 什么时候流优先于传统循环以获得最佳性能?流是否利用分支预测? - When should streams be preferred over traditional loops for best performance? Do streams take advantage of branch-prediction? 为什么局部可变长度的for循环更快? 分支预测不会减少查找时间的影响吗? - Why are local variable length for-loops faster? Doesn't branch prediction reduce the effect of lookup times? Git 是离开 WIP 分支并开始在另一个分支上工作的更好方法 - Git a better way for leaving WIP branch and start working on a another branch
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM