为什么分支预测比没有分支更快？

Question

Inspired by this question: Why is it faster to process a sorted array than an unsorted array? 灵感来自这个问题：为什么处理排序数组比处理未排序数组更快？

I wrote my own branch prediction experiment: 我写了自己的分支预测实验：

public class BranchPrediction {
    public static void main(final String[] args) {
        long start;
        long sum = 0;

        /* No branch */
        start = System.nanoTime();
        sum = 0;
        for (long i = 0; i < 10000000000L; ++i)
            sum += i;
        System.out.println(System.nanoTime() - start);
        System.out.println(sum);

        /* With branch */
        start = System.nanoTime();
        sum = 0;
        for (long i = 0; i < 10000000000L; ++i)
            if (i >= 0)
                sum += i;
        System.out.println(System.nanoTime() - start);
        System.out.println(sum);

        /* No branch (again) */
        start = System.nanoTime();
        sum = 0;
        for (long i = 0; i < 10000000000L; ++i)
            sum += i;
        System.out.println(System.nanoTime() - start);
        System.out.println(sum);

        /* With branch (again) */
        start = System.nanoTime();
        sum = 0;
        for (long i = 0; i < 10000000000L; ++i)
            if (i >= 0)
                sum += i;
        System.out.println(System.nanoTime() - start);
        System.out.println(sum);
    }
}

The result confuses me: according to program output, the loop with a branch is reliably faster than no branch loops. 结果让我感到困惑：根据程序输出，带分支的循环比没有分支循环的循环可靠得快。

Example output: 示例输出：

7949691477
-5340232226128654848
6947699555
-5340232226128654848
7920972795
-5340232226128654848
7055459799
-5340232226128654848

Why is it so? 为什么会这样？

Edit: 编辑：

Disassembled class shows Java compiler did not optimize (miss) anything ( https://gist.github.com/HouzuoGuo/5692424 ) 反汇编类显示Java编译器没有优化（遗漏）任何东西（ https://gist.github.com/HouzuoGuo/5692424 ）
The Java benchmark technique used by author of Why is it faster to process a sorted array than an unsorted array? 作者使用的Java基准技术为什么处理排序数组比未排序数组更快？ is the same as mine. 和我一样。
The machine is an Intel core i7, running Linux 3.2 64-bit and Oracle JVM 1.7 64-bit 该机器是Intel核心i7，运行Linux 3.2 64位和Oracle JVM 1.7 64位
When I supersize the number of loop iterations, with-branch loop runs multi-SECONDS faster than non-branch loop. 当我超过循环迭代次数时，with-branch循环比非分支循环运行多SECONDS。

Answer 1

After running the same experiment on my other machines (Intel servers and workstations), I may conclude that the phenomenon I experienced is specific to this laptop CPU (Intel i7 Q740M). 在我的其他机器（英特尔服务器和工作站）上运行相同的实验后，我可能会得出结论，我所经历的这种现象特定于这款笔记本电脑CPU（英特尔i7 Q740M）。

==== 6 months later edit ==== ==== 6个月后编辑====

Check this out: http://eli.thegreenplace.net/2013/12/03/intel-i7-loop-performance-anomaly/ 看看这个： http ： //eli.thegreenplace.net/2013/12/03/intel-i7-loop-performance-anomaly/

Answer 2

Have in mind that JVM is optimizing execution internally, and there are caches inside your PC that make computing faster. 请记住，JVM在内部优化执行，并且PC内部有缓存可以加快计算速度。 Since you have so powerful processor (many independant cores) it is not strange. 由于您拥有如此强大的处理器（许多独立的核心），因此并不奇怪。 Also note that there is code that runs under the Java code which maps to machine code of your PC. 另请注意，在Java代码下运行的代码映射到PC的机器代码。 Just type code as optimized as you can, let JVM worry about it. 只需输入尽可能优化的代码，让JVM担心它。

EDIT: Machines and hardware like big load, they operate with more efficiency then. 编辑：机器和硬件，如大负载，它们运行效率更高。 Especially caches. 特别是缓存。

为什么分支预测比没有分支更快？

问题描述

2 个解决方案

解决方案1
2 已采纳

解决方案2
2 2013-06-02 10:03:15

为什么分支预测比没有分支更快？

问题描述

2 个解决方案

解决方案1 2 已采纳

解决方案2 2 2013-06-02 10:03:15

解决方案1
2 已采纳

解决方案2
2 2013-06-02 10:03:15