（如何）Java JIT编译器优化我的代码？

Question

I'm writing fairly low level code that must be highly optimized for speed. 我正在编写相当低级别的代码，必须对速度进行高度优化。 Every CPU cycle counts. 每个CPU周期都很重要。 Since the code is in Java I can't write as low level as in C for example, but I want to get everything out of the VM that I can. 因为代码是用Java编写的，所以我不能像C中那样写低级别，但是我希望能够从VM中获取所有内容。

I'm processing an array of bytes. 我正在处理一个字节数组。 There are two parts of my code that I'm primarily interested in at the moment. 我的代码有两个部分，我现在主要感兴趣。 The first one is: 第一个是：

int key =  (data[i]     & 0xff)
        | ((data[i + 1] & 0xff) <<  8)
        | ((data[i + 2] & 0xff) << 16)
        | ((data[i + 3] & 0xff) << 24);

and the second one is: 第二个是：

key = (key << 15) | (key >>> 17);

Judging from the performance I'm guessing that these statements aren't optimized the way I expect. 从性能来看，我猜这些陈述没有按照我的预期进行优化。 The second statement is basically a ROTL 15, key . 第二个陈述基本上是ROTL 15, key 。 The first statement loads 4 bytes into an int. 第一个语句将4个字节加载到int中。 The 0xff masks are there only to compensate for the added sign bits resulting from the implicit cast to int if the accessed byte happens to be negative. 如果访问的字节恰好为负，则0xff掩码仅用于补偿由于隐式转换为int而产生的添加符号位。 This should be easy to translate to efficient machine code, but to my surprise performance goes up if I remove the masks. 这应该很容易转换为高效的机器代码，但令我惊讶的是，如果我删除了掩码，性能会上升。 (Which of course breaks my code, but I was interested to see what happens.) （这当然会破坏我的代码，但我很想知道会发生什么。）

What's going on here? 这里发生了什么？ Do the most common Java VMs optimize this code during JIT in the way one would expect a good C++ compiler to optimize the equivalent C++ code? 最常见的Java VM是否会在JIT期间以期望优秀的C ++编译器优化等效C ++代码的方式优化此代码？ Can I influence this process? 我可以影响这个过程吗？ Setting -XX:+AggressiveOpts seems to make no difference. 设置-XX:+AggressiveOpts似乎没什么区别。

(CPU: x64, Platform: Linux/HotSpot) （CPU：x64，平台：Linux / HotSpot）

Answer 1

How do you test the performance? 你如何测试性能？

Here is a good article: 这是一篇好文章：

http://www.ibm.com/developerworks/java/library/j-benchmark1/index.html http://www.ibm.com/developerworks/java/library/j-benchmark1/index.html

http://www.ibm.com/developerworks/java/library/j-benchmark2/index.html http://www.ibm.com/developerworks/java/library/j-benchmark2/index.html

http://ellipticgroup.com/html/benchmarkingArticle.html http://ellipticgroup.com/html/benchmarkingArticle.html

Answer 2

I've done a lot of performance code in Java, I've even coded directly in Bytecode, enough to be sure of a couple of thing : the JIT is a black box with obscure behaviours, the JIT and compilers are incredibly efficient, and the simplest code usually yield the best performance. 我已经在Java中完成了很多性能代码，我甚至直接在Bytecode中编码，足以确定一些事情：JIT是一个带有模糊行为的黑盒子，JIT和编译器非常高效，并且最简单的代码通常会产生最佳性能。

This is normal when you think about the GOAL of the JIT: extract the best possible performance from any Java code. 当您考虑JIT的目标时，这是正常的：从任何Java代码中提取最佳性能。 When you add that Java is quite a simple and plain language, the simple code will be optimized, and any further trick will generally do no good. 当你添加Java是一个非常简单和简单的语言时，简单的代码将被优化，任何进一步的技巧通常都没有好处。

Of course, there are some common pitfalls and gotchas you ought to know, but I see none in your code samples. 当然，你应该知道一些常见的陷阱和陷阱，但我在你的代码示例中没有看到。 Were I to optimize your code, I would go straight to the higher level: algorithm. 如果我要优化您的代码，我会直接进入更高级别：算法。 What is the complexity of your code? 代码的复杂性是多少？ Can some data be cached? 可以缓存一些数据吗？ What APIs are used? 使用了什么API？ Etc... There's a seemingly endless pit of performance to be extracted from algorithmic tricks alone. 等等......单独从算法技巧中提取出一个看似无穷无尽的性能坑。

And if even this is not sufficient, if the language is not fast enough, if your machine is not fast enough, if your algorithm cannot be made any faster, the answer won't lie in "clock cycles", because you might squeeze 20% of efficiency, but 20% will never be enough when your data grow. 如果即使这还不够，如果语言不够快，如果你的机器不够快，如果你的算法不能更快，那么答案就不在于“时钟周期”，因为你可能会挤压20效率的百分比，但是当数据增长时，20％将永远不够。 To be sure you will never hit (again) a performance wall, the ultimate answer lies in scalability: make your algorithm and your data endlessly distributable so you can just throw more workers to the task. 为了确保你永远不会（再次）击中性能墙，最终的答案在于可扩展性：使您的算法和数据无限可分配，这样您就可以让更多的工作人员完成任务。

Answer 3

我确实与solendil同意，但如果你想挖在低层次更深，尝试获得所描述的由JIT生成的代码在这里。

Answer 4

在将24位向左移位之前，您不需要（＆0xff）。

（如何）Java JIT编译器优化我的代码？

问题描述

4 个解决方案

解决方案1
7 2011-11-24 11:24:04

解决方案2
5 2011-11-24 15:32:23

解决方案3
3 2011-11-24 15:41:26

解决方案4
0 2011-11-24 15:44:23

（如何）Java JIT编译器优化我的代码？

问题描述

4 个解决方案

解决方案1 7 2011-11-24 11:24:04

解决方案2 5 2011-11-24 15:32:23

解决方案3 3 2011-11-24 15:41:26

解决方案4 0 2011-11-24 15:44:23

解决方案1
7 2011-11-24 11:24:04

解决方案2
5 2011-11-24 15:32:23

解决方案3
3 2011-11-24 15:41:26

解决方案4
0 2011-11-24 15:44:23