Java直接数组索引访问与循环访问之间的性能差异

Question

I was experimenting with predicates. 我正在尝试谓词。 I tried to implement the predicate for serializing issues in distributed systems. 我试图实现用于序列化分布式系统中问题的谓词。 I wrote a simple example where the test function just returns true. 我写了一个简单的示例，其中test函数只是返回true。 I was measuring the overhead, and I stumbled upon this interesting problem. 我正在测量开销，却偶然发现了这个有趣的问题。 Accessing array in for loop is 10 times slower compared to direct access. 与直接访问相比，在for循环中访问数组要慢10倍。

class Test {
    public boolean test(Object o) { return true; }
}

long count = 1000000000l;
Test[] test = new Test[3];
test[0] = new Test();
test[1] = new Test();
test[2] = new Test();
long milliseconds = System.currentTimeMillis();
for(int i = 0; i < count; i++){
    boolean result = true;
    Object object = new Object();
    for(int j = 0; j < test.length; j++){
        result = result && test[j].test(object);
    }
}
System.out.println((System.currentTimeMillis() - milliseconds));

However, the following code is almost 10 times faster. 但是，以下代码几乎快了10倍。 What can be the reason? 可能是什么原因？

milliseconds = System.currentTimeMillis();
for(int i=0 ; i < count; i++) {
    Object object = new Object();
    boolean result = test[0].test(object) && test[1].test(object) && test[2].test(object);
}
System.out.println((System.currentTimeMillis() - milliseconds));

Benchmark results on my i5. 我的i5的基准测试结果。

4567 msec for for loop access 4567毫秒用于循环访问
297 msec for direct access 297毫秒的直接访问权限

Answer 1

If loop header takes one unit time to execute the in first solution loop header evaluations takes 3N units of time. 如果循环头花费一个单位时间来执行第一个解决方案，则循环头评估将花费3N个时间单位。 While in direct access it takes N. 直接访问时需要N。

Other than loop header overhead in first solution 3 && conditions per iteration to evaluate while in second there are only 2. 除了第一个解决方案3 &&每个迭代要评估的条件中的循环头开销外，第二个条件中只有2个。

And last but not the least Boolean short-circuit evaluation which causes your second, faster example, to stop testing the condition "prematurely", ie the entire result evaluates to false if first && condition results false. 最后但并非最不重要的布尔短路评估会导致您的第二个更快的示例“过早”停止测试条件，即，如果第一个&&条件结果为false，则整个结果为false。

Answer 2

Due to the predictable result of test(Object o) the compiler is able to optimize the second piece of code quite effectively. 由于test(Object o)的可预测结果，编译器能够非常有效地优化第二段代码。 The second loop in the first piece of code makes this optimization impossible. 第一段代码中的第二个循环使此优化无法实现。

Compare the result with the following Test class: 将结果与以下Test类进行比较：

static class Test {
    public boolean test(Object o) {
        return Math.random() > 0.5;
    }
}

... and the loops: ...和循环：

    long count = 100000000l;
    Test[] test = new Test[3];
    test[0] = new Test();
    test[1] = new Test();
    test[2] = new Test();

    long milliseconds = System.currentTimeMillis();

    for(int i = 0; i < count; i++){
        boolean result = true;
        Object object = new Object();
        for(int j = 0; j < test.length; j++){
            result = result && test[j].test(object);
        }
    }

    System.out.println((System.currentTimeMillis() - milliseconds));
    milliseconds = System.currentTimeMillis();

    for(int i=0 ; i < count; i++) {
        Object object = new Object();
        boolean result = test[0].test(object) && test[1].test(object) && test[2].test(object);
    }

    System.out.println((System.currentTimeMillis() - milliseconds));

Now both loops require almost the same time: 现在，两个循环几乎需要相同的时间：

run:
3759
3368
BUILD SUCCESSFUL (total time: 7 seconds)

ps: check out this article for more about JIT compiler optimizations. ps：请查看本文以获取有关JIT编译器优化的更多信息。

Answer 3

You are committing almost every basic mistake you can make with a microbenchmark. 您犯下了使用微基准测试几乎可以犯的所有基本错误。

You don't ensure code cannot be optimized away by making sure to actually use the calculations result. 您不能通过确保实际使用计算结果来确保无法优化代码。
Your two code branches have subtly but decidedly different logic ~~(as pointed out variant two will always short-circuit)~~ . 您的两个代码分支具有微妙但确定的逻辑~~（正如指出的那样，第二种总是会短路）~~ 。 The second case is easier to optimize for the JIT due to test() returning a constant. 由于test（）返回一个常数，第二种情况对于JIT更容易优化。
You did not warm up the code, inviting JIT optimization time being included somewhere into the execution time 你没有热身的代码，邀请JIT优化时间被地方纳入执行时间
Your testing code is not accounting for execution order of test cases exerting an influence on the test results. 您的测试代码没有考虑影响测试结果的测试用例的执行顺序。 Its not fair to run case 1, then case 2 with the same data and objects. 运行案例1，然后运行具有相同数据和对象的案例2是不公平的。 The JIT will by the time case 2 runs have optimized the test method and collected runtime statistics about its behavior (at the expense of case 1's execution time). JIT在案例2运行时将优化测试方法并收集有关其行为的运行时统计信息（以案例1的执行时间为代价）。

Java直接数组索引访问与循环访问之间的性能差异

问题描述

3 个解决方案

解决方案1
1 2016-11-28 18:56:25

解决方案2
1 2016-11-28 19:36:36

解决方案3
1 2016-11-28 20:57:19

Java直接数组索引访问与循环访问之间的性能差异

问题描述

3 个解决方案

解决方案1 1 2016-11-28 18:56:25

解决方案2 1 2016-11-28 19:36:36

解决方案3 1 2016-11-28 20:57:19

解决方案1
1 2016-11-28 18:56:25

解决方案2
1 2016-11-28 19:36:36

解决方案3
1 2016-11-28 20:57:19