简体   繁体   English

Java - 调用静态方法与手动内联 - 性能开销

[英]Java - calling static methods vs manual inlining - performance overhead

I am interested whether should I manually inline small methods which are called 100k - 1 million times in some performance-sensitive algorithm. 我感兴趣的是我是否应该手动内联在一些性能敏感算法中称为100k-100万次的小方法。

First, I thought that, by not inlining, I am incurring some overhead since JVM will have to find determine whether or not to inline this method (or even fail to do so). 首先,我认为,由于没有内联,我会产生一些开销,因为JVM必须确定是否要内联这个方法(甚至不能这样做)。

However, the other day, I replaced this manually inlined code with invocation of static methods and seen a performance boost. 然而,前几天,我用静态方法的调用替换了这个手动内联代码,并看到了性能提升。 How is that possible? 怎么可能? Does this suggest that there is actually no overhead and that by letting JVM inline at "its will" actually boosts performance? 这是否表明实际上没有开销,让JVM内联“意志”实际上提升了性能? Or this hugely depends on the platform/architecture? 或者这在很大程度上取决于平台/架构?

(The example in which a performance boost occurred was replacing array swapping ( int t = a[i]; a[i] = a[j]; a[j] = t; ) with a static method call swap(int[] a, int i, int j) . Another example in which there was no performance difference was when I inlined a 10-liner method which was called 1000000 times.) (发生性能提升的示例是使用静态方法调用swap(int[] a, int i, int j) )替换数组交换( int t = a[i]; a[i] = a[j]; a[j] = t; )) swap(int[] a, int i, int j) 。另一个没有性能差异的例子是我内联10个方法,称为1000000次。)

I have seen something similar. 我见过类似的东西。 "Manual inlining" isn't necessarily faster, the result program can be too complex for optimizer to analyze. “手动内联”不一定更快,结果程序可能太复杂而无法进行优化分析。

In your example let's make some wild guesses. 在你的例子中,让我们做一些疯狂的猜测。 When you use the swap() method, JVM may be able to analyze the method body, and conclude that since i and j don't change, although there are 4 array accesses, only 2 range checks are needed instead of 4. Also the local variable t isn't necessary, JVM can use 2 registers to do the job, without involving r/w of t on stack. 当您使用swap()方法时,JVM可能能够分析方法体,并得出结论,由于i和j不会更改,尽管有4个数组访问,但只需要2个范围检查而不是4个。局部变量t是没有必要的,JVM可使用2个寄存器来完成这项工作,而不涉及R / W的t上堆叠。

Later, the body of swap() is inlined into the caller method. 之后,swap()的主体被内联到调用方法中。 That is after the previous optimization, so the saves are still in place. 这是在上一次优化之后,因此保存仍然存在。 It's even possible that caller method body has proved that i and j are always within range, so the 2 remaining range checks are also dropped. 调用者方法体甚至可能已经证明i和j总是在范围内,因此剩下的2个范围检查也被丢弃。

Now in the manually inlined version, the optimizer has to analyze the whole program at once, there are too many variables and too many actions, it may fail to prove that it's safe to save range checks, or eliminate the local variable t . 现在在手动内联版本中,优化器必须立即分析整个程序,变量太多,动作太多,可能无法证明保存范围检查或消除局部变量t In the worst case this version may cost 6 more memory accesses to do the swap, which is a huge overhead. 在最坏的情况下,这个版本可能需要花费6个以上的内存访问来进行交换,这是一个巨大的开销。 Even if there is only 1 extra memory read, it is still very noticeable. 即使只有1个额外的内存读取,它仍然非常明显。

Of course, we have no basis to believe that it's always better to do manual "outlining", ie extract small methods, wishfully thinking that it will help the optimizer. 当然,我们没有理由认为手动“概述”总是更好,即提取小方法,如愿以为它会帮助优化器。

-- -

What I've learned is that, forget manual micro optimizations. 我所学到的是,忘记手动微优化。 It's not that I don't care about micro performance improvements, it's not that I always trust JVM's optimization. 并不是我不关心微观性能改进,而是我始终信任JVM的优化。 It is that I have absolutely no idea what to do that does more good than bad. 这是我完全不知道该做什么比做坏事更好。 So I gave up. 所以我放弃了。

The JVM can inline small methods very efficiently. JVM可以非常有效地内联小方法。 The only benifit inlining yourself is if you can remove code ie simplify what it does by inlining it. 唯一能够自我介绍的好处是,如果你可以删除代码,即通过内联来简化代码。

The JVM looks for certain structures and has some "hand coded" optimisations when it recognises those structures. JVM在识别这些结构时会查找某些结构并进行一些“手动编码”优化。 By using a swap method, the JVM may recognise the structure and optimise it differently with a specific optimisation. 通过使用交换方法,JVM可以识别结构并通过特定优化以不同方式对其进行优化。

You might be interested to try the OpenJDK 7 debug version which has an option to print out the native code it generates. 您可能有兴趣尝试OpenJDK 7调试版本,该版本可以选择打印出它生成的本机代码。

Sorry for my late reply, but I just found this topic and it got my attention. 对不起我迟到的回复,但我刚发现这个话题,引起了我的注意。

When developing in Java, try to write "simple and stupid" code. 在Java中开发时,尝试编写“简单而愚蠢”的代码。 Reasons: 原因:

  1. the optimization is made at runtime (since the compilation itself is made at runtime). 优化是在运行时进行的(因为编译本身是在运行时进行的)。 The compiler will figure out anyway what optimization to make, since it compiles not the source code you write, but the internal representation it uses (several AST -> VM code -> VM code ... -> native binary code transformations are made at runtime by the JVM compiler and the JVM interpreter) 编译器无论如何都要弄清楚要做什么优化,因为它不是编译你编写的源代码,而是编译它使用的内部表示(几个AST - > VM代码 - > VM代码...... - >本机二进制代码转换是在JVM编译器和JVM解释器的运行时)
  2. When optimizing the compiler uses some common programming patterns in deciding what to optimize; 优化编译器时使用一些通用编程模式来决定优化内容; so help him help you! 所以帮助他帮助你! write a private static (maybe also final) method and it will figure out immediately that it can: 编写一个私有静态(也许是最终的)方法,它会立即发现它可以:
    • inline the method 内联方法
    • compile it to native code 将其编译为本机代码

If the method is manually inlined, it's just part of another method which the compiler first tries to understand and see whether it's time to transform it into binary code or if it must wait a bit too understand the program flow. 如果方法是手动内联的,那么它只是编译器首先尝试理解的另一种方法的一部分,并且看是否有时间将其转换为二进制代码,或者是否必须稍等一下才能理解程序流程。 Also, depending on what the method does, several re-JIT'ings are possible during runtime => JVM produces optimum binary code only after a "warm up"... and maybe your program ended before the JVM warms itself up (because I expect that in the end the performance should be fairly similar). 此外,根据方法的作用,在运行期间可以进行多次重新JIT:> JVM仅在“预热”后生成最佳二进制代码...并且可能在JVM自行升温之前程序结束(因为我期望最终表现应该非常相似)。

Conclusion: it makes sense to optimize code in C/C++ (since the translation into binary is made statically), but the same optimizations usually don't make a difference in Java, where the compiler JITs byte code, not your source code. 结论:在C / C ++中优化代码是有意义的(因为二进制转换是静态的),但相同的优化通常不会对Java产生影响,因为编译器JIT是字节代码,而不是源代码。 And btw, from what I've seen javac doesn't even bother to make optimizations :) 顺便说一句,从我看到的javac甚至懒得做出优化:)

However, the other day, I replaced this manually inlined code with invocation of static methods and seen a performance boost. 然而,前几天,我用静态方法的调用替换了这个手动内联代码,并看到了性能提升。 How is that possible? 怎么可能?

Probably the JVM profiler sees the bottleneck more easily if it is in one place (a static method) than if it is implemented several times separately. 可能JVM分析器在一个地方(静态方法)比在单独实施多次时更容易看到瓶颈。

The Hotspot JIT compiler is capable of inlining a lot of things, especially in -server mode, although I don't know how you got an actual performance boost. Hotspot JIT编译器能够内联很多东西,特别是在-server模式下,虽然我不知道你是如何得到实际的性能提升的。 (My guess would be that inlining is done by method invocation count and the method swapping the two values isn't called too often.) (我的猜测是内联是通过方法调用计数完成的,并且交换这两个值的方法不会经常调用。)

By the way, if its performance really matters, you could try this for swapping two int values. 顺便说一句,如果它的性能真的很重要,你可以尝试这个来交换两个int值。 (I'm not saying it will be faster, but it may be worth a punt.) (我不是说它会更快,但它可能值得一试。)

a[i] = a[i] ^ a[j];
a[j] = a[i] ^ a[j];
a[i] = a[i] ^ a[j];

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM