阵列访问如何影响性能？

Question

int steps = 256 * 1024 * 1024;
int[] a = new int[2];

// Loop 1
for (int i=0; i<steps; i++) { a[0]++; a[0]++; }

// Loop 2
for (int i=0; i<steps; i++) { a[0]++; a[1]++; }

Can someone explain why the second loop is 20x times slower than the first (19 ms vs 232 ms)? 有人可以解释为什么第二个循环比第一个循环慢20倍（19毫秒vs 232毫秒）？

That is how I'm timing it: 这就是我计时的方式：

long start_time = System.currentTimeMillis();

// Loop

long end_time = System.currentTimeMillis();
System.out.println(end_time - start_time);

Answer 1

Summary 摘要

The JIT compiler is converting the first loop into a multiply, but not optimizing the second loop very much. JIT编译器将第一个循环转换为乘法，但不是非常优化第二个循环。

Discussion 讨论

The bytecode for both loops is basically the same (you can view this with javap -c test.class ). 两个循环的字节码基本相同（您可以使用javap -c test.class ）。

In Java, the bytecode is converted into x86 instructions by a JIT compiler which has the ability to perform additional optimizations. 在Java中，字节码由JIT编译器转换为x86指令，JIT编译器能够执行其他优化。

You can actually view the assembly produced by the JIT via java -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly ... if you have the hsdis plugin. 您实际上可以通过java -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly ...查看JIT生成的程序集java -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly ...如果您有hsdis插件。

I changed the value you add to each element to 0xbad to make it easier to spot the relevant code and changed the loop counter to long . 我将添加到每个元素的值更改为0xbad，以便更容易发现相关代码并将循环计数器更改为long 。

The first loop produces: 第一个循环产生：

  mov     r11d,dword ptr [r13+10h]    Load from memory a[0]
  ...
  add     r11d,175ah                  Add 2 * 0xbad to the value
  mov     dword ptr [r13+10h],r11d    Store to memory a[0]

The second loop produces: 第二个循环产生：

   mov     ebx,dword ptr [rax+10h]    Load from memory a[0]
   add     ebx,0badh                  Add 0xbad
   ...
   mov     dword ptr [rax+10h],ebx    Store to memory
   ...
   mov     ebx,dword ptr [rax+14h]    Load from memory a[1]
   add     ebx,0badh                  Add 0xbad
   ...
   mov     dword ptr [rax+14h],ebx    Store to memory a[1]

so you can see that the compiler is already able to optimize the first loop into fewer instructions. 所以你可以看到编译器已经能够将第一个循环优化为更少的指令。

In particular, it has spotted that the two additions to the same array element can be coalesced into a single addition of twice the value. 特别是，它发现相同数组元素的两个加法可以合并为两次加值的单个加法。

When I changed the loop counter back to int I noticed that the compiler manages to do even better with your first loop: 当我将循环计数器更改回int我注意到编译器设法在第一个循环中做得更好：

mov     r10d,dword ptr [r14+10h]
imul    ecx,r13d,175ah     This line converts lots of adds of 0xbad into a single multiply  
mov     r11d,r10d
sub     r11d,ecx
add     r10d,175ah
mov     dword ptr [r14+10h],r10d

In this case it has spotted that it can actually implement several iterations of your loop in a single pass by using a multiplication! 在这种情况下，它发现它可以通过使用乘法实际上在一次传递中实现循环的多次迭代！ This explains how the first loop can be an order of magnitude faster than the second. 这解释了第一个循环如何比第二个循环快一个数量级。

阵列访问如何影响性能？

问题描述

1 个解决方案

解决方案1
9 已采纳 2017-07-21 20:04:54

Summary 摘要

Discussion 讨论

阵列访问如何影响性能？

问题描述

1 个解决方案

解决方案1 9 已采纳 2017-07-21 20:04:54

Summary 摘要

Discussion 讨论

解决方案1
9 已采纳 2017-07-21 20:04:54