简体   繁体   中英

why where in scala is nearly 400 times slower than in java?

JDK version:7u51
scala verson:2.11.2

code:
scala:

object MyApp extends App{
    test
    val start = System.nanoTime()
    test
    println(System.nanoTime() - start)

    def test {
        var i = 0
        while(i < 100000000){
            i += 1
        }
    }
  }

java:

public class MyTest {

    public static void main(String[] args) {
        test();
        long start = System.nanoTime();
        test();
        System.out.println(System.nanoTime() - start);
    }

    private static void test(){
        int i = 0;
        while(i < 100000000){
            i += 1;
        }
    }
}

scala version takes about 20000000 nanoseconds

java version takes about 50000 nanoseconds

the bytecode about test is almost the same but in java here is iinc 0 1 [i] but in scala is
iload_1 [i] iconst_1 iadd istore_1 [i] instead (more bytecodes)

And there is no JIT optimization in scala that I remove the first test(warm-up) it has no obvious effect but in java it will take much more time(almost 100 times more than before,but still about 10 times faster than scala version)


After post this,I have tried in JDK8u25 and scala2.11.4 but not much different.

iinc 0 1 [i] 

This means that the local variable number 0, which is [i], is incremented by one.

The JVM Just-in-time compiler probably stores this local variable into the CPU register, therefore can just call INC assembly operation for each cycle of the loop. Which probably takes one CPU cycle on any modern CPU.

iload_1 [i] 
iconst_1
iadd
istore_1 [i]

This one first puts local variable number 1, which is [i], on the stack. Then puts the constant 1 on the stack. Then adds two top elements on the stack and put the result on top of the stack. And the last step its copies the sum to the local variable number 1, which is [i].

Even if all these operations happen using L1 cache of the CPU and the local variable stored in the CPU register, it still will require many more cycles than the other case.

JVM Just-in-time compiler could be smart enough to recognize this as a pattern and replace with one-liner below. But people who initially wrote it would not have such motivation because their Java compiler would never output such code. Plus, JVM Just-in-time compiler has a very limited time budget that it has for all optimizations. So it can only check the finite list of optimization.

This seems like something that can be easily solved by fixing the Scala compiler.

Unfortunately, at this time I have no easy way to measure your Scala code, but your Java test method measured with a serious JVM benchmarking tool (JMH), clocks in at 0.5 nanoseconds—a proof that the JIT compiler treated it as a no-op. Even if we improve the method's realism by returning i from it, the JIT compiler is still smart enough to just replace the loop with a direct formula, this time clocking in at mere 1.3 nanoseconds.

The major point to take from the above is: beware of microbenchmarks like the one you have written. The results you get from them are usually misleading in the extreme if treated as indicator of performance in a production setting.

(If you provide me with a JAR or a .class file with your Scala code, I will be happy to measure that variant on JMH, too.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM