简体   繁体   English

使用JNI可能会提高性能?

[英]Possible increase of performance using JNI?

A year or two ago I created a program written in Java to simulate the n-body problem. 一两年前,我创建了一个用Java编写的程序来模拟n体问题。 Recently I've gotten the crazy idé to rewrite the program as a distributed program to be able to simulate larger masses with better accuracy. 最近,我已经疯狂地将程序重写为分布式程序,以便能够以更高的精度模拟更大的质量。

Profiling the old program showed, as expected, that roughly 90% of the program was spent calculating float type values. 正如预期的那样,对旧程序进行分析表明,大约90%的程序用于计算浮点型值。 If I remember correctly C/C++ is ALOT faster then Java at doing arithmetics operations, especially float type calculations. 如果我没记错的话C / C ++比算术运算时更快,尤其是浮点型计算。

Anyways, here's the actual question :) 无论如何,这是实际的问题:)

By using JNI, can I expect an increase in speed equal to a program written in C/C++ (for the calculations) or will the JVM slow it down? 通过使用JNI,我可以期望速度的增加等于用C / C ++编写的程序(用于计算)还是JVM会降低速度?

Most float operations take around 1 ns in Java, so I am not sure how much faster you would expect them to be in C++. Java中大多数浮点运算大约需要1 ns,因此我不确定它们在C ++中的速度会有多快。

However JNI calls often take around 30 ns, so unless you are performing alot of floating point operations per call, you will cost more than you save. 但是,JNI调用通常需要大约30 ns,因此除非您每次调用执行大量浮点运算,否则您将花费​​比保存更多的成本。

As the following micro-benchmark suggests, once the code as warmed up, each operations is sub nano second. 正如以下微基准所暗示的那样,一旦代码热身,每个操作都是亚纳秒。

If you want this to go faster, you could use multiple cores and make it 4x or more faster. 如果您希望速度更快,可以使用多个内核并使其速度提高4倍或更快。

public static void main(String[] args) throws Exception {
    int length = 200000;
    double[] a = fill(new double[length]);
    double[] b = fill(new double[length]);
    double[] c = fill(new double[length]);
    double[] x = new double[length];

    for (int i = 0; i < 10; i++)
        testTime(length, a, b, c, x);
}

private static void testTime(int length, double[] a, double[] b, double[] c, double[] x) {
    long start = System.nanoTime();
    for (int i = 0; i < length; i++)
        x[i] = a[i] * b[i] + c[i];
    long time = System.nanoTime() - start;
    System.out.printf("Average time per double operation was %.1f ns%n", time / 2.0 / length);
}

private static double[] fill(double[] doubles) {
    for (int i = 0; i < doubles.length; i++)
        doubles[i] =  Math.random();
    return doubles;
}

prints 版画

Average time per double operation was 10.9 ns
Average time per double operation was 17.9 ns
Average time per double operation was 1.7 ns
Average time per double operation was 1.0 ns
Average time per double operation was 0.9 ns
Average time per double operation was 0.8 ns
Average time per double operation was 0.9 ns
Average time per double operation was 0.8 ns
Average time per double operation was 1.0 ns
Average time per double operation was 0.9 ns

I think this answer: JNI Performance also applies here. 我想这个答案: JNI Performance也适用于此。 If you are calling JNI many times doing few calculations each time your performance should suffer. 如果你在每次性能受到影响时多次调用JNI。 If you call JNI for heavy calculations then the C optimized code should perform faster. 如果您为重度计算调用JNI,那么C优化代码应该执行得更快。

I have actually recently tested fractal generation (Mandelbrot) with the same code written in Java and then ported to C and to my surprise, I observed slight DECREASE in speed of computation when using the JNI method. 我实际上最近测试了分形生成(Mandelbrot)与使用Java编写的相同代码,然后移植到C,令我惊讶的是,我在使用JNI方法时观察到计算速度略有下降。 I can explain this phenomenon by only one thing: If you use c code, you can't get advantage of HotSpot's optimization for repeatable computations. 我只能通过一件事来解释这个现象:如果你使用c代码,你就无法利用HotSpot的优化来进行可重复的计算。

You can check the sample code yourself at: http://code.google.com/p/frgenjava/ 您可以在以下网址查看示例代码: http//code.google.com/p/frgenjava/

EDIT: in the situation I described, when using JNI, I disregard the overhead for a JNI call, which took about 20ns for a call and even then C performed slower. 编辑:在我描述的情况下,当使用JNI时,我忽略了JNI调用的开销,调用大约需要20ns,即使C执行得更慢。

The most important thing with performance concerns is to test and bench. 性能问题最重要的是测试和测试。 You have heard that C++ is better at floating point than java. 您已经听说过C ++在浮点数方面比java更好。 Ok this may be the case. 好的可能是这种情况。 But without a bench to show this actual difference, this is not worth a penny. 但如果没有替补来展示这种实际差异,这不值一分钱。 This could be really false. 这可能是错误的。

In fact, modern java use a JIT. 实际上,现代java使用JIT。 What's that ? 那是什么 ? Well we all know java use bytecode and that the bytecode is interpreted. 好吧,我们都知道java使用字节码,并解释字节码。 This is true and false. 这是真的和错误的。 In fact heavily used code is compiled on the fly to native code optimised for your platform. 实际上,大量使用的代码会动态编译为针对您的平台优化的本机代码。 The JIT can even perform optimisation not possible in C/C++ by using execution statistics. JIT甚至可以通过使用执行统计信息在C / C ++中执行不可能的优化。

Now Java and the JVM is widely accepted as a really fast and effective platform. 现在Java和JVM被广泛接受为一个非常快速有效的平台。 People start to really use it in heavy computing aera with quite success. 人们开始在重型计算领域真正使用它并取得了相当大的成功。 It is also easier to deploy on a grid. 在网格上部署也更容易。

Recent benchmarks tend to show performance similar to C/C++ (for exemple http://blogs.oracle.com/amurillo/entry/java_vs_c ) 最近的基准测试往往表现出类似于C / C ++的性能(例如http://blogs.oracle.com/amurillo/entry/java_vs_c

So would you gain from JNI and just porting to C++. 那么你会从JNI获得并只是移植到C ++。 I would say no, or very little gains. 我会说不,或收获很少。 (But again test if you want to be sure). (但如果你想确定,再次测试)。 And without optimising it, the C++ version could be slower. 没有优化它,C ++版本可能会更慢。

Could you gain massive improvement by using JNI, and using optimised assembly code (including SEE instructions) ? 您是否可以通过使用JNI并使用优化的汇编代码(包括SEE指令)获得大幅改进? Definitely yes if you do it the right way. 如果你以正确的方式做到,肯定是的。 It would need lot of benchs, expertise and time through. 这需要很多工作台,专业知识和时间。

我不能评论Java算术运算的速度,但我知道JNI会直接调用你的C ++代码,所以你会得到原生速度,是的

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM