Java方法调用性能

Question

I have this piece of code doing Range Minimum Query . 我有这段代码做范围最小查询。 When t = 100000, i and j are always changed in each input line, its execution time in Java 8u60 is about 12 secs. 当t = 100000时，i和j总是在每个输入行中改变，它在Java 8u60中的执行时间约为12秒。

for (int a0 = 0; a0 < t; a0++) {
    String line = reader.readLine();
    String[] ls = line.split(" ");
    int i = Integer.parseInt(ls[0]);
    int j = Integer.parseInt(ls[1]);
    int min = width[i];
    for (int k = i + 1; k <= j; k++) {
        if (min > width[k]) {
            min = width[k];
        }
    }
    writer.write(min + "");
    writer.newLine();
}

When I extract a new method to find minimum Value, the execution time is 4 times faster (about 2.5 secs). 当我提取新方法以找到最小值时，执行时间快4倍（约2.5秒）。

    for (int a0 = 0; a0 < t; a0++) {
        String line = reader.readLine();
        String[] ls = line.split(" ");
        int i = Integer.parseInt(ls[0]);
        int j = Integer.parseInt(ls[1]);
        int min = getMin(i, j);
        writer.write(min + "");
        writer.newLine();
    }

private int getMin(int i, int j) {
    int min = width[i];
    for (int k = i + 1; k <= j; k++) {
        if (min > width[k]) {
            min = width[k];
        }
    }
    return min;
}

I always thought that method calls are slow. 我一直认为方法调用很慢。 But this example shows the opposite. 但这个例子显示了相反的情况。 Java 6 also demonstrates this, but the execution times is much slower in both case (17 secs and 10 secs). Java 6也证明了这一点，但两种情况下的执行时间都要慢得多（17秒和10秒）。 Can someone provide some insight into this? 有人可以对此提供一些见解吗？

Answer 1

TL;DR JIT compiler has more opportunities to optimize the inner loop in the second case, because on-stack replacement happens at the different point. TL; DR JIT编译器在第二种情况下有更多机会优化内部循环，因为堆栈内替换发生在不同的点。

I've managed to reproduce the problem with the reduced test case. 我已经设法用简化的测试用例重现问题。
No I/O or string operations involved, just two nested loops with array access. 不涉及I / O或字符串操作，只有两个具有数组访问权限的嵌套循环。

public class NestedLoop {
    private static final int ARRAY_SIZE = 5000;
    private static final int ITERATIONS = 1000000;

    private int[] width = new java.util.Random(0).ints(ARRAY_SIZE).toArray();

    public long inline() {
        long sum = 0;

        for (int i = 0; i < ITERATIONS; i++) {
            int min = width[0];
            for (int k = 1; k < ARRAY_SIZE; k++) {
                if (min > width[k]) {
                    min = width[k];
                }
            }
            sum += min;
        }

        return sum;
    }

    public long methodCall() {
        long sum = 0;

        for (int i = 0; i < ITERATIONS; i++) {
            int min = getMin();
            sum += min;
        }

        return sum;
    }

    private int getMin() {
        int min = width[0];
        for (int k = 1; k < ARRAY_SIZE; k++) {
            if (min > width[k]) {
                min = width[k];
            }
        }
        return min;
    }

    public static void main(String[] args) {
        long startTime = System.nanoTime();
        long sum = new NestedLoop().inline();  // or .methodCall();
        long endTime = System.nanoTime();

        long ms = (endTime - startTime) / 1000000;
        System.out.println("sum = " + sum + ", time = " + ms + " ms");
    }
}

inline variant indeed works 3-4 times slower than methodCall . inline变体确实比methodCall慢3-4倍。

I've used the following JVM options to confirm that both benchmarks are compiled on the highest tier and OSR (on-stack replacement) successfully occurs in both cases. 我已经使用以下JVM选项来确认两个基准测试都是在最高层编译的，并且在两种情况下都成功地发生了OSR（堆栈内替换）。

-XX:-TieredCompilation
-XX:CompileOnly=NestedLoop
-XX:+UnlockDiagnosticVMOptions
-XX:+PrintCompilation
-XX:+TraceNMethodInstalls

'inline' compilation log: '内联'编译日志：

    251   46 %           NestedLoop::inline @ 21 (70 bytes)
Installing osr method (4) NestedLoop.inline()J @ 21

'methodCall' compilation log: 'methodCall'编译日志：

    271   46             NestedLoop::getMin (41 bytes)
Installing method (4) NestedLoop.getMin()I 
    274   47 %           NestedLoop::getMin @ 9 (41 bytes)
Installing osr method (4) NestedLoop.getMin()I @ 9
    314   48 %           NestedLoop::methodCall @ 4 (30 bytes)
Installing osr method (4) NestedLoop.methodCall()J @ 4

This means JIT does its job, but the generated code must be different. 这意味着JIT可以完成它的工作，但生成的代码必须是不同的。
Let's analyze it with -XX:+PrintAssembly . 让我们用-XX:+PrintAssembly分析它。

'inline' disassembly (the hottest fragment) '内联'反汇编（最热门的片段）

0x0000000002df4dd0: inc    %ebp               ; OopMap{r11=Derived_oop_rbx rbx=Oop off=114}
                                              ;*goto
                                              ; - NestedLoop::inline@53 (line 12)

0x0000000002df4dd2: test   %eax,-0x1d64dd8(%rip)        # 0x0000000001090000
                                              ;*iload
                                              ; - NestedLoop::inline@21 (line 12)
                                              ;   {poll}
0x0000000002df4dd8: cmp    $0x1388,%ebp
0x0000000002df4dde: jge    0x0000000002df4dfd  ;*if_icmpge
                                              ; - NestedLoop::inline@26 (line 12)

0x0000000002df4de0: test   %rbx,%rbx
0x0000000002df4de3: je     0x0000000002df4e4c
0x0000000002df4de5: mov    (%r11),%r10d       ;*getfield width
                                              ; - NestedLoop::inline@32 (line 13)

0x0000000002df4de8: mov    0xc(%r10),%r9d     ; implicit exception
0x0000000002df4dec: cmp    %r9d,%ebp
0x0000000002df4def: jae    0x0000000002df4e59
0x0000000002df4df1: mov    0x10(%r10,%rbp,4),%r8d  ;*iaload
                                              ; - NestedLoop::inline@37 (line 13)

0x0000000002df4df6: cmp    %r8d,%r13d
0x0000000002df4df9: jg     0x0000000002df4dc6  ;*if_icmple
                                              ; - NestedLoop::inline@38 (line 13)

0x0000000002df4dfb: jmp    0x0000000002df4dd0

'methodCall' disassembly (also the hottest part) 'methodCall'反汇编（也是最热门的部分）

0x0000000002da2af0: add    $0x8,%edx          ;*iinc
                                              ; - NestedLoop::getMin@33 (line 36)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2af3: cmp    $0x1381,%edx
0x0000000002da2af9: jge    0x0000000002da2b70  ;*iload_1
                                              ; - NestedLoop::getMin@16 (line 37)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2afb: mov    0x10(%r9,%rdx,4),%r11d  ;*iaload
                                              ; - NestedLoop::getMin@22 (line 37)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b00: cmp    %r11d,%ecx
0x0000000002da2b03: jg     0x0000000002da2b6b  ;*iinc
                                              ; - NestedLoop::getMin@33 (line 36)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b05: mov    0x14(%r9,%rdx,4),%r11d  ;*iaload
                                              ; - NestedLoop::getMin@22 (line 37)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b0a: cmp    %r11d,%ecx
0x0000000002da2b0d: jg     0x0000000002da2b5c  ;*iinc
                                              ; - NestedLoop::getMin@33 (line 36)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b0f: mov    0x18(%r9,%rdx,4),%r11d  ;*iaload
                                              ; - NestedLoop::getMin@22 (line 37)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b14: cmp    %r11d,%ecx
0x0000000002da2b17: jg     0x0000000002da2b4d  ;*iinc
                                              ; - NestedLoop::getMin@33 (line 36)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b19: mov    0x1c(%r9,%rdx,4),%r11d  ;*iaload
                                              ; - NestedLoop::getMin@22 (line 37)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b1e: cmp    %r11d,%ecx
0x0000000002da2b21: jg     0x0000000002da2b66  ;*iinc
                                              ; - NestedLoop::getMin@33 (line 36)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b23: mov    0x20(%r9,%rdx,4),%r11d  ;*iaload
                                              ; - NestedLoop::getMin@22 (line 37)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b28: cmp    %r11d,%ecx
0x0000000002da2b2b: jg     0x0000000002da2b61  ;*iinc
                                              ; - NestedLoop::getMin@33 (line 36)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b2d: mov    0x24(%r9,%rdx,4),%r11d  ;*iaload
                                              ; - NestedLoop::getMin@22 (line 37)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b32: cmp    %r11d,%ecx
0x0000000002da2b35: jg     0x0000000002da2b52  ;*iinc
                                              ; - NestedLoop::getMin@33 (line 36)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b37: mov    0x28(%r9,%rdx,4),%r11d  ;*iaload
                                              ; - NestedLoop::getMin@22 (line 37)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b3c: cmp    %r11d,%ecx
0x0000000002da2b3f: jg     0x0000000002da2b57  ;*iinc
                                              ; - NestedLoop::getMin@33 (line 36)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b41: mov    0x2c(%r9,%rdx,4),%r11d  ;*iaload
                                              ; - NestedLoop::getMin@22 (line 37)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b46: cmp    %r11d,%ecx
0x0000000002da2b49: jg     0x0000000002da2ae6  ;*if_icmple
                                              ; - NestedLoop::getMin@23 (line 37)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b4b: jmp    0x0000000002da2af0

The compiled code is completely different; 编译后的代码完全不同; methodCall is optimized much better. methodCall优化得更好。

the loop has 8 iterations unrolled; 循环展开了8次迭代;
there is no array bounds check inside; 里面没有数组边界检查;
width field is cached in the register. width字段缓存在寄存器中。

In contrast, inline variant 相比之下， inline变体

does not do loop unrolling; 不循环展开;
loads width array from memory every time; 每次从内存加载width数组;
performs array bounds check on each iteration. 对每次迭代执行数组边界检查。

OSR-compiled methods are not always optimized very well, because they have to maintain the state of an interpreted stack frame at the transition point. OSR编译的方法并不总是很好地优化，因为它们必须在转换点维护解释的堆栈帧的状态。 Here is another example of the same problem. 这是同一问题的另一个例子。

On-stack replacement usually occurs on backward branches (ie at the bottom of the loop). 堆栈替换通常发生在后向分支上（即在循环的底部）。 inline method has two nested loops, and OSR happens inside the inner loop, while methodCall has just one outer loop. inline方法有两个嵌套循环，OSR发生在内部循环中，而methodCall只有一个外部循环。 OSR transition in the outer loop in more favourable, because JIT compiler has more freedom to optimize the inner loop. 外循环中的OSR转换更有利，因为JIT编译器可以更自由地优化内循环。 And this is what exactly happens in your case. 这就是你的情况下究竟发生的事情。

Answer 2

Without doing an actual analysis, getMin is most likely being JIT compiled as you extracted it to a method that's called many times. 在没有进行实际分析的情况下， getMin很可能在您将其提取到多次调用的方法时进行JIT编译。 If you are using the HotSpot JVM, this happens by default after 10,000 method executions. 如果您使用的是HotSpot JVM，则默认情况下会在10,000次方法执行后发生。

You can always inspect the final code used by your application by using the right flags and JVM builds. 您始终可以使用正确的标志和JVM构建来检查应用程序使用的最终代码。 Check out the question/answer How to see JIT-compiled code in JVM for an example how. 查看问题/答案如何在JVM中查看JIT编译的代码示例。

Answer 3

One advantage of Java over languages compiled as C ++ is that JIT (Just in time compiler) can do optimizations from the bytecode at the time the code is executed. Java相对于编译为C ++的语言的一个优点是JIT（即时编译器）可以在执行代码时从字节码进行优化。 In addition, the Java compiler itself is prepared to do several optimizations already in the build phases. 此外，Java编译器本身已准备好在构建阶段进行多次优化。 These techniques allow, for example, to turn a method call into inline code within a loop, thus avoiding repetitive method search overhead in polymorphic calls. 例如，这些技术允许将方法调用转换为循环内的内联代码，从而避免多态调用中重复的方法搜索开销。 Making a method call run inline means that the method code runs as if it had been written directly in the place where the method is called. 使方法调用内联运行意味着方法代码的运行就像它直接写在调用方法的位置一样。 Thus, there is no search overhead of the method to be executed, memory allocation, new context variables. 因此，没有要执行的方法，内存分配，新上下文变量的搜索开销。 Basically within your loop for, loss of processing occurs when you allocate new variables in memory (int k for example), when you pass this for into a method, you end up decreasing the overhead because the variables will already be allocated to This execution 基本上在你的循环中，当你在内存中分配新变量时会发生丢失处理（例如int k），当你将它传递给一个方法时，你最终会减少开销，因为变量已经被分配给了这个执行

Answer 4

The question does not provide a reproducible test case. 这个问题没有提供可重复的测试用例。 So I built one that focuses solely on computing the ranged minima: 所以我建立了一个专注于计算远程最小值的：

git clone git@github.com:lemire/microbenchmarks.git
cd microbenchmarks
mvn clean install
java -cp target/microbenchmarks-0.0.1-jar-with-dependencies.jar me.lemire.microbenchmarks.rangequery.RangeMinimum

My results are (on a server configured for testing, Java 8): 我的结果是（在配置用于测试的服务器上，Java 8）：

m.l.m.r.RangeMinimum.embeddedmin    avgt        5  0.053 ± 0.009  ms/op
m.l.m.r.RangeMinimum.fncmin         avgt        5  0.052 ± 0.003  ms/op

So there is no significant performance difference in my test case between having one large loop with a subloop, or having one loop containing a function call. 因此，在我的测试用例中，有一个大循环与子循环，或者有一个包含函数调用的循环之间没有明显的性能差异。 Note that the benchmark calls functions multiple times so that the JIT compiler can do its work. 请注意，基准测试会多次调用函数，以便JIT编译器可以执行其工作。

Answer 5

I believe java is doing some optimization/memoization. 我相信java正在做一些优化/ memoization。 It could cache the results of functions if functions/methods are pure. 如果函数/方法是纯粹的，它可以缓存函数的结果。 I believe your time has decreased but your space/memory will increase (due to memoization) and vice versa. 我相信你的时间已经减少，但你的空间/记忆会增加（由于记忆），反之亦然。

Java方法调用性能

问题描述

5 个解决方案

解决方案1
12 已采纳 2017-03-03 23:29:54

'inline' compilation log: '内联'编译日志：

'methodCall' compilation log: 'methodCall'编译日志：

'inline' disassembly (the hottest fragment) '内联'反汇编（最热门的片段）

'methodCall' disassembly (also the hottest part) 'methodCall'反汇编（也是最热门的部分）

解决方案2
4 2017-03-03 03:41:30

解决方案3
1 2017-03-03 03:51:31

解决方案4
1 2017-03-03 16:43:46

解决方案5
-2 2017-03-03 03:55:59

Java方法调用性能

问题描述

5 个解决方案

解决方案1 12 已采纳 2017-03-03 23:29:54

'inline' compilation log: '内联'编译日志：

'methodCall' compilation log: 'methodCall'编译日志：

'inline' disassembly (the hottest fragment) '内联'反汇编（最热门的片段）

'methodCall' disassembly (also the hottest part) 'methodCall'反汇编（也是最热门的部分）

解决方案2 4 2017-03-03 03:41:30

解决方案3 1 2017-03-03 03:51:31

解决方案4 1 2017-03-03 16:43:46

解决方案5 -2 2017-03-03 03:55:59

解决方案1
12 已采纳 2017-03-03 23:29:54

解决方案2
4 2017-03-03 03:41:30

解决方案3
1 2017-03-03 03:51:31

解决方案4
1 2017-03-03 16:43:46

解决方案5
-2 2017-03-03 03:55:59