[英]Java method direct invocation vs single element loop invocation
Recently I wanted to see what is the performance difference invoking a method on an object directly, vs invoking the same method on the same object, If that objected is added in a single element ArrayList and we try to loop that one element List.最近我想看看直接在 object 上调用方法与在同一个 object 上调用相同方法的性能差异是什么,如果将反对添加到单个元素 ArrayList 中,我们尝试循环该元素
To be honest my assumption was that there would be small difference, as i would expect the single element loop to be unrolled, hence the invocation to mimic a direct invocation.老实说,我的假设是会有小的差异,因为我希望展开单个元素循环,因此调用来模拟直接调用。
To test that I created the below simple JMH example: https://gist.github.com/nikkatsa/69a58c63c1b79b245d700f6465bce7f7为了测试我创建了以下简单的 JMH 示例: https://gist.github.com/nikkatsa/69a58c63c1b79b245d700f6465bce7f7
To my surprise invoking the method directly on the element seems to be ~2,5x faster.令我惊讶的是,直接在元素上调用该方法似乎要快 2.5 倍。 Which is a significant number, as my expectation totally different:
这是一个很大的数字,因为我的期望完全不同:
Benchmark Mode Cnt Score Error Units
SingleElementLoopBenchmark.directInvocation thrpt 10 332576972.176 ± 2740616.755 ops/s
SingleElementLoopBenchmark.singleElementLoopInvocation thrpt 10 130179329.978 ± 1303776.248 ops/s
I tweeted a bit asking for some pointers to understand why that was happenning.我发了一点推文,要求一些指示来理解为什么会发生这种情况。 I was told that running
perf
will give me a clue.有人告诉我,运行
perf
会给我一个线索。 So i did run with -perf dtraceasm
and I got the below two outputs, the first for the direct invocation and the second for the looped one:所以我确实使用
-perf dtraceasm
运行,我得到了以下两个输出,第一个用于直接调用,第二个用于循环调用:
....[Hottest Region 1]..............................................................................
c2, com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub, version 115 (69 bytes)
0x000000010b73c92c: mov %rbp,%r9
0x000000010b73c92f: movzbl 0x94(%r9),%r10d ;*getfield isDone {reexecute=0 rethrow=0 return_oop=0}
; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@27 (line 123)
; implicit exception: dispatches to 0x000000010b73ca5e
0x000000010b73c937: test %r10d,%r10d
0x000000010b73c93a: jne 0x000000010b73c9da ;*ifeq {reexecute=0 rethrow=0 return_oop=0}
; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@30 (line 123)
0x000000010b73c940: mov $0x1,%ebp
0x000000010b73c945: data16 data16 nopw 0x0(%rax,%rax,1) ;*aload_1 {reexecute=0 rethrow=0 return_oop=0}
; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@33 (line 124)
9.56% ↗ 0x000000010b73c950: mov 0x40(%rsp),%r10
1.00% │ 0x000000010b73c955: mov 0xc(%r10),%r10d ;*getfield dispatcher {reexecute=0 rethrow=0 return_oop=0}
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::directInvocation@1 (line 23)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@17 (line 121)
0.17% │ 0x000000010b73c959: mov 0xc(%r12,%r10,8),%r11d ; implicit exception: dispatches to 0x000000010b73ca12
11.18% │ 0x000000010b73c95e: test %r11d,%r11d
0.00% │ 0x000000010b73c961: je 0x000000010b73c9c9 ;*invokevirtual performAction {reexecute=0 rethrow=0 return_oop=0}
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invoke@5 (line 40)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::directInvocation@5 (line 23)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@17 (line 121)
10.69% │ 0x000000010b73c963: mov %r9,(%rsp)
0.65% │ 0x000000010b73c967: mov 0x38(%rsp),%rsi
0.00% │ 0x000000010b73c96c: mov $0x1,%edx
0.14% │ 0x000000010b73c971: xchg %ax,%ax
10.08% │ 0x000000010b73c973: callq 0x000000010b6c2900 ; ImmutableOopMap{[48]=Oop [56]=Oop [64]=Oop [0]=Oop }
│ ;*invokevirtual consume {reexecute=0 rethrow=0 return_oop=0}
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Listener::performAction@2 (line 53)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invoke@5 (line 40)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::directInvocation@5 (line 23)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@17 (line 121)
│ ; {optimized virtual_call}
1.44% │ 0x000000010b73c978: mov (%rsp),%r9
0.19% │ 0x000000010b73c97c: movzbl 0x94(%r9),%r8d ;*ifeq {reexecute=0 rethrow=0 return_oop=0}
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@30 (line 123)
9.77% │ 0x000000010b73c984: mov 0x108(%r15),%r10
0.99% │ 0x000000010b73c98b: add $0x1,%rbp ; ImmutableOopMap{r9=Oop [48]=Oop [56]=Oop [64]=Oop }
│ ;*ifeq {reexecute=1 rethrow=0 return_oop=0}
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@30 (line 123)
0.02% │ 0x000000010b73c98f: test %eax,(%r10) ; {poll}
0.28% │ 0x000000010b73c992: test %r8d,%r8d
0.00% ╰ 0x000000010b73c995: je 0x000000010b73c950 ;*aload_1 {reexecute=0 rethrow=0 return_oop=0}
; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@33 (line 124)
0x000000010b73c997: movabs $0x10abd7d62,%r10
0x000000010b73c9a1: callq *%r10 ;*invokestatic nanoTime {reexecute=0 rethrow=0 return_oop=0}
; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@34 (line 124)
0x000000010b73c9a4: mov 0x30(%rsp),%r10
0x000000010b73c9a9: mov %rbp,0x18(%r10) ;*putfield measuredOps {reexecute=0 rethrow=0 return_oop=0}
; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@49 (line 126)
0x000000010b73c9ad: mov %rax,0x30(%r10) ;*putfield stopTime {reexecute=0 rethrow=0 return_oop=0}
; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_directInvocation_jmhTest::directInvocation_thrpt_jmhStub@37 (line 124)
0x000000010b73c9b1: movq $0x0,0x20(%r10) ;*invokevirtual directInvocation {reexecute=0 rethrow=0 return_oop=0}
....................................................................................................
....[Hottest Region 1]..............................................................................
c2, com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub, version 115 (279 bytes)
; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@27 (line 123)
; implicit exception: dispatches to 0x000000011153fffe
0x000000011153fa81: test %r10d,%r10d
0x000000011153fa84: jne 0x000000011153fbe4 ;*ifeq {reexecute=0 rethrow=0 return_oop=0}
; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@30 (line 123)
0x000000011153fa8a: mov 0x58(%rsp),%rdx
0x000000011153fa8f: test %rdx,%rdx
0x000000011153fa92: je 0x000000011153fd5e
0x000000011153fa98: mov $0x1,%ebx
╭ 0x000000011153fa9d: jmp 0x000000011153fad6
0.19% │ ↗ 0x000000011153fa9f: mov 0x58(%rsp),%r13
3.55% │ │ 0x000000011153faa4: mov (%rsp),%rcx
0.09% │ │ 0x000000011153faa8: mov 0x60(%rsp),%rdx
0.22% │ │ 0x000000011153faad: mov 0x50(%rsp),%r11
0.17% │ │ 0x000000011153fab2: mov 0x8(%rsp),%rbx ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
│ │ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@12 (line 44)
│ │ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
│ │ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
3.55% │↗│ 0x000000011153fab7: movzbl 0x94(%r11),%r8d ;*goto {reexecute=0 rethrow=0 return_oop=0}
│││ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@35 (line 44)
│││ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
│││ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
0.16% │││ 0x000000011153fabf: mov 0x108(%r15),%r10
0.28% │││ 0x000000011153fac6: add $0x1,%rbx ; ImmutableOopMap{r11=Oop rcx=Oop rdx=Oop r13=Oop }
│││ ;*ifeq {reexecute=1 rethrow=0 return_oop=0}
│││ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@30 (line 123)
0.19% │││ 0x000000011153faca: test %eax,(%r10) ; {poll}
4.00% │││ 0x000000011153facd: test %r8d,%r8d
│││ 0x000000011153fad0: jne 0x000000011153fbe9 ;*aload_1 {reexecute=0 rethrow=0 return_oop=0}
│││ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@33 (line 124)
0.07% ↘││ 0x000000011153fad6: mov 0xc(%rcx),%r8d ;*getfield dispatcher {reexecute=0 rethrow=0 return_oop=0}
││ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@1 (line 28)
││ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
0.22% ││ 0x000000011153fada: mov 0x10(%r12,%r8,8),%r10d ;*getfield singleListenerList {reexecute=0 rethrow=0 return_oop=0}
││ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@4 (line 44)
││ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
││ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
││ ; implicit exception: dispatches to 0x000000011153ff2a
0.21% ││ 0x000000011153fadf: mov 0x8(%r12,%r10,8),%edi ; implicit exception: dispatches to 0x000000011153ff3e
4.39% ││ 0x000000011153fae4: cmp $0x237565,%edi ; {metadata('com/nikoskatsanos/benchmarks/loops/SingleElementLoopBenchmark$Dispatcher$1')}
││ 0x000000011153faea: jne 0x000000011153fc92
0.33% ││ 0x000000011153faf0: lea (%r12,%r10,8),%r9 ;*invokeinterface size {reexecute=0 rethrow=0 return_oop=0}
││ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@7 (line 44)
││ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
││ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
0.09% ││ 0x000000011153faf4: mov 0x10(%r9),%r9d
0.14% ││ 0x000000011153faf8: test %r9d,%r9d
╰│ 0x000000011153fafb: jle 0x000000011153fab7 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@12 (line 44)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
3.98% │ 0x000000011153fafd: lea (%r12,%r8,8),%rdi ;*getfield dispatcher {reexecute=0 rethrow=0 return_oop=0}
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@1 (line 28)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
0.06% │ 0x000000011153fb01: xor %r9d,%r9d ;*aload_0 {reexecute=0 rethrow=0 return_oop=0}
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@15 (line 45)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
0.09% │ 0x000000011153fb04: mov 0x8(%r12,%r10,8),%esi ; implicit exception: dispatches to 0x000000011153ff4e
0.06% │ 0x000000011153fb09: cmp $0x237565,%esi ; {metadata('com/nikoskatsanos/benchmarks/loops/SingleElementLoopBenchmark$Dispatcher$1')}
0.00% │ 0x000000011153fb0f: jne 0x000000011153fcc2
3.93% │ 0x000000011153fb15: lea (%r12,%r10,8),%rax ;*invokeinterface get {reexecute=0 rethrow=0 return_oop=0}
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@20 (line 45)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
0.06% │ 0x000000011153fb19: mov 0x10(%rax),%r10d ;*getfield size {reexecute=0 rethrow=0 return_oop=0}
│ ; - java.util.ArrayList::get@2 (line 458)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@20 (line 45)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
0.11% │ 0x000000011153fb1d: test %r10d,%r10d
│ 0x000000011153fb20: jl 0x000000011153fcf6 ;*invokestatic checkIndex {reexecute=0 rethrow=0 return_oop=0}
│ ; - java.util.Objects::checkIndex@3 (line 372)
│ ; - java.util.ArrayList::get@5 (line 458)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@20 (line 45)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
0.28% │ 0x000000011153fb26: cmp %r10d,%r9d
0.00% │ 0x000000011153fb29: jae 0x000000011153fc1c
3.97% │ 0x000000011153fb2f: mov 0x14(%rax),%r10d ;*getfield elementData {reexecute=0 rethrow=0 return_oop=0}
│ ; - java.util.ArrayList::elementData@1 (line 442)
│ ; - java.util.ArrayList::get@11 (line 459)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@20 (line 45)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
0.05% │ 0x000000011153fb33: mov %r9d,%ebp ;*invokestatic checkIndex {reexecute=0 rethrow=0 return_oop=0}
│ ; - java.util.Objects::checkIndex@3 (line 372)
│ ; - java.util.ArrayList::get@5 (line 458)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@20 (line 45)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
0.08% │ 0x000000011153fb36: mov 0xc(%r12,%r10,8),%esi ; implicit exception: dispatches to 0x000000011153ff62
1.27% │ 0x000000011153fb3b: cmp %esi,%ebp
│ 0x000000011153fb3d: jae 0x000000011153fc5a
3.94% │ 0x000000011153fb43: shl $0x3,%r10
0.05% │ 0x000000011153fb47: mov 0x10(%r10,%rbp,4),%r9d ;*aaload {reexecute=0 rethrow=0 return_oop=0}
│ ; - java.util.ArrayList::elementData@5 (line 442)
│ ; - java.util.ArrayList::get@11 (line 459)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@20 (line 45)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
1.71% │ 0x000000011153fb4c: mov 0x8(%r12,%r9,8),%r10d ; implicit exception: dispatches to 0x000000011153ff72
17.85% │ 0x000000011153fb51: cmp $0x237522,%r10d ; {metadata('com/nikoskatsanos/benchmarks/loops/SingleElementLoopBenchmark$Listener')}
0.00% │ 0x000000011153fb58: jne 0x000000011153fef6 ;*checkcast {reexecute=0 rethrow=0 return_oop=0}
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@25 (line 45)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
3.79% │ 0x000000011153fb5e: mov %rdi,0x18(%rsp)
0.02% │ 0x000000011153fb63: mov %r8d,0x10(%rsp)
0.02% │ 0x000000011153fb68: mov %rbx,0x8(%rsp)
0.19% │ 0x000000011153fb6d: mov %r11,0x50(%rsp)
3.95% │ 0x000000011153fb72: mov %rdx,0x60(%rsp)
0.02% │ 0x000000011153fb77: mov %rcx,(%rsp)
0.03% │ 0x000000011153fb7b: mov %r13,0x58(%rsp)
0.36% │ 0x000000011153fb80: mov %rdx,%rsi
3.78% │ 0x000000011153fb83: mov $0x1,%edx
0.01% │ 0x000000011153fb88: vzeroupper
4.05% │ 0x000000011153fb8b: callq 0x00000001114c2900 ; ImmutableOopMap{[80]=Oop [88]=Oop [96]=Oop [0]=Oop [16]=NarrowOop [24]=Oop }
│ ;*invokevirtual consume {reexecute=0 rethrow=0 return_oop=0}
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Listener::performAction@2 (line 53)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@29 (line 45)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
│ ; {optimized virtual_call}
0.98% │ 0x000000011153fb90: mov 0x10(%rsp),%r8d
3.61% │ 0x000000011153fb95: mov 0x10(%r12,%r8,8),%r10d ;*getfield singleListenerList {reexecute=0 rethrow=0 return_oop=0}
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@4 (line 44)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
0.24% │ 0x000000011153fb9a: mov 0x8(%r12,%r10,8),%r9d ; implicit exception: dispatches to 0x000000011153ff9e
0.74% │ 0x000000011153fb9f: inc %ebp ;*iinc {reexecute=0 rethrow=0 return_oop=0}
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@32 (line 44)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
0.04% │ 0x000000011153fba1: cmp $0x237565,%r9d ; {metadata('com/nikoskatsanos/benchmarks/loops/SingleElementLoopBenchmark$Dispatcher$1')}
0.00% │ 0x000000011153fba8: jne 0x000000011153fd36
3.60% │ 0x000000011153fbae: lea (%r12,%r10,8),%r11 ;*invokeinterface size {reexecute=0 rethrow=0 return_oop=0}
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@7 (line 44)
│ ; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
│ ; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
0.11% │ 0x000000011153fbb2: mov 0x10(%r11),%r9d
0.35% │ 0x000000011153fbb6: cmp %r9d,%ebp
╰ 0x000000011153fbb9: jge 0x000000011153fa9f ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
; - com.benchmarks.loops.SingleElementLoopBenchmark$Dispatcher::invokeInLoop@12 (line 44)
; - com.benchmarks.loops.SingleElementLoopBenchmark::singleElementLoopInvocation@5 (line 28)
; - com.benchmarks.loops.generated.SingleElementLoopBenchmark_singleElementLoopInvocation_jmhTest::singleElementLoopInvocation_thrpt_jmhStub@17 (line 121)
0x000000011153fbbf: mov %ebp,%r9d
0x000000011153fbc2: mov 0x58(%rsp),%r13
0x000000011153fbc7: mov (%rsp),%rcx
0x000000011153fbcb: mov 0x60(%rsp),%rdx
0x000000011153fbd0: mov 0x50(%rsp),%r11
0x000000011153fbd5: mov 0x8(%rsp),%rbx
....................................................................................................
I would like some help if possible to understand the above two outputs.如果可能的话,我想要一些帮助来理解上述两个输出。 From the very little I understand it seems that when looping, the program is paying quite a cost of checking the size of the array list in order to perform the loop and also cost of casting the object to the appropriate type, due to the list generics.
据我所知,似乎在循环时,程序付出了相当大的成本来检查数组列表的大小以执行循环,以及将 object 转换为适当类型的成本,因为列表 generics . I cannot really understand if the loops gets unrolled, but I believe it does?
我无法真正理解循环是否展开,但我相信它会吗?
Would be grateful if someone understand the above output and can throw some pointers to what is happening.如果有人了解上述 output 并能指出正在发生的事情,将不胜感激。
You measure an operation that does virtually nothing vs. an operation that involves at least four dependent memory loads:您测量一个几乎什么都不做的操作与一个涉及至少四个相关 memory 负载的操作:
plus at least one bounds check.加上至少一个边界检查。 I say "at least", because that's what the best optimizing compiler could probably do.
我说“至少”,因为这是最好的优化编译器可能会做的事情。 HotSpot, however, performs two bound checks: one in ArrayList.get(), another for elementData array access.
然而,HotSpot 执行两项绑定检查:一项在 ArrayList.get() 中,另一项用于 elementData 数组访问。
Furthermore, ArrayList holds an array of Objects, so in order to get an instance of Listener
from the list, an object needs to be cast to the target type.此外,ArrayList 包含一个 Objects 数组,因此为了从列表中获取
Listener
的实例,需要将 object 强制转换为目标类型。 That's one more memory load plus the type check.这又是一个 memory 负载加上类型检查。
Look at the benchmark scores from a different angle: a single element list is not 2.5x slower, but rather 3 nanoseconds longer:从另一个角度看基准分数:单个元素列表不是慢 2.5 倍,而是多 3 纳秒:
Benchmark Mode Cnt Score Error Units
SingleElementLoopBenchmark.directInvocation avgt 5 2,340 ± 0,095 ns/op
SingleElementLoopBenchmark.singleElementLoopInvocation avgt 5 5,283 ± 0,074 ns/op
3 nanoseconds! 3纳秒! It's just 6 CPU cycles of a 2 GHz core.
这只是 2 GHz 内核的 6 个 CPU 周期。 The light can travel less than 1 metre in 3 nanoseconds.
光可以在 3 纳秒内传播不到 1 米。 Did you really expect the overhead of ArrayList less than that?
您真的期望 ArrayList 的开销比这少吗? No matter how low the overhead is, if comparing it with no overhead at all, the ratio could be infinitely large.
不管开销有多低,如果和完全没有开销相比,这个比例可能是无限大的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.