why is this simple C++ addition 6 times slower than the equivalent Java?

Question

hello stackoverflow users, this is my first question asked, so if there are any errors in my way of expressing it, please point it out, thank you

I wrote this simple calculation function in both Java and C++

Java:

long start = System.nanoTime();
long total = 0;
for (int i = 0; i < 2147483647; i++) {
    total += i;
}
System.out.println(total);
System.out.println(System.nanoTime() - start);

C++:

auto start = chrono::high_resolution_clock::now();
register long long total = 0;
for (register int i = 0; i < 2147483647; i++)
{
    total += i;
}
cout << total << endl;
auto finish = chrono::high_resolution_clock::now();
cout << chrono::duration_cast<chrono::nanoseconds>(finish - start).count() << endl;

software: - JDK8u11 - Microsoft Visual C++ Compiler (2013)

results:

Java: 2305843005992468481 1096361110

C++: 2305843005992468481 6544374300

The calculation results are the same, which is good however, the nano time printed shows the Java program takes 1 second while in C++ it takes 6 seconds to execute

I've been doing Java for quite some time, but I am new to C++, is there any problem in my code? or is it a fact that C++ is slower than Java with simple calculations?

also, i used the "register" keyword in my C++ code, hoping it will bring performance improvements, but the execution time doesn't differ at all, could someone explain this?

EDIT: My mistake here is the C++ compiler settings are not optimized, and output is set to x32, after applying /O2 WIN64 and removing DEBUG, the program only took 0.7 seconds to execute

The JDK by default applies optimization to output, however this is not the case for VC++, which favors compilation speed by default, different C++ compilers also vary in result, some will calculate the loop's result in compile time, leading to extremely short execution times (around 5 microseconds)

NOTE: Given the right conditions, the C++ program will perform better than Java in this simple test, however I noticed many runtime safety checks are skipped, violating it's debug intention as a "safe language", I believe C++ will even more outperform Java in a large array test, as it does not have bound checking

Answer 1

On Linux/Debian/Sid/x86-64, using OpenJDK 7 with

// file test.java
class Test {
    public static void main(String[] args) {
    long start = System.nanoTime();
    long total = 0;
    for (int i = 0; i < 2147483647; i++) {
        total += i;
    }
    System.out.println(total);
    System.out.println(System.nanoTime() - start);
    }
}

and GCC 4.9 with

   // file test.cc
#include <iostream>
#include <chrono>

int main (int argc, char**argv) {
 using namespace std;
 auto start = chrono::high_resolution_clock::now();
 long long total = 0;
 for (int i = 0; i < 2147483647; i++)
   {
     total += i;
   }
 cout << total << endl;
 auto finish = chrono::high_resolution_clock::now();
 cout << chrono::duration_cast<chrono::nanoseconds>(finish - start).count()
      << endl;
}

Then compiling and running test.java with

javac test.java
java Test

I'm getting the output

2305843005992468481
774937152

when compiling test.cc with optimizations

g++ -O2 -std=c++11 test.cc -o test-gcc

and running ./test-gcc it goes much faster

2305843005992468481
40291

Of course without optimizations g++ -std=c++11 test.cc -o test-gcc the run is slower

2305843005992468481
5208949116

By looking at the assembler code using g++ -O2 -fverbose-asm -S -std=c++11 test.cc I see that the compiler computed the result at compile time:

    .globl  main
    .type   main, @function
  main:
  .LFB1530:
    .cfi_startproc
    pushq   %rbx    #
    .cfi_def_cfa_offset 16
    .cfi_offset 3, -16
    call    _ZNSt6chrono3_V212system_clock3nowEv    #
    movabsq $2305843005992468481, %rsi  #,
    movl    $_ZSt4cout, %edi    #,
    movq    %rax, %rbx  #, start
    call    _ZNSo9_M_insertIxEERSoT_    #
    movq    %rax, %rdi  # D.35007,
    call    _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_  #
    call    _ZNSt6chrono3_V212system_clock3nowEv    #
    subq    %rbx, %rax  # start, D.35008
    movl    $_ZSt4cout, %edi    #,
    movq    %rax, %rsi  # D.35008, D.35008
    call    _ZNSo9_M_insertIlEERSoT_    #
    movq    %rax, %rdi  # D.35007,
    call    _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_  #
    xorl    %eax, %eax  #
    popq    %rbx    #
    .cfi_def_cfa_offset 8
    ret
    .cfi_endproc
  .LFE1530:
            .size   main, .-main

So you just need to enable optimizations in your compiler (or switch to a better compiler, like GCC 4.9)

BTW on Java low level optimizations happen in the JIT of the JVM . I don't know JAVA well but I don't think I need to switch them on. I do know that on GCC you need to enable optimizations which of course are ahead of time (eg with -O2 )

^{PS: I never used any Microsoft compiler in this 21st century, so I cannot help you on how to enable optimizations in it.}

At last, I dont believe that such microbenchmarks are significant. Benchmark then optimize your real applications.

Answer 2

Takes about .6 seconds (.592801000 seconds) on my system, Intel 2600K, 3.40ghz, with MSVC Express 2013, 64 bit mode, standard release build. Moved the cout to after setting finish to not include the overhead of cout.

#include <iostream>
#include <chrono>

using namespace std;

int main()
{
    auto start = chrono::high_resolution_clock::now();
    register long long total = 0;
    for (register int i = 0; i < 2147483647; i++)
    {
        total += i;
    }
    auto finish = chrono::high_resolution_clock::now();
    cout << total << endl;
    cout << chrono::duration_cast<chrono::nanoseconds>(finish - start).count() << endl;
    return 0;
}

Answer 3

I think the easiest way to describe why C/C++ will ALWAYS be faster than Java is to understand how Java works.

From the beginning, Java has been developed to facilitate cross-platform software. Before Java, one had to compile their program on each machine family separately. Even now, with the variety of hardware architectures, accepted standards, and OS's out there, one cannot get around this hurdle. Java accomplishes this through its Compiler and JVM. The compiler applies whatever optimizations it can and assembles this into a Java bytecode, which is like a shorthand for the optimized source that was compiled. However, this bytecode cannot be understood by the processor yet.

This is where the Java Virtual Machine comes in. First the JVM figures out what environment it is being run in and loads the appropriate translation table. Then the bytecode is read into the JVM and each code is looked up on the table and translated into the environment's native machine code, and then executed.

As you know, this all takes a tiny bit of time per instruction. But with a compiled C/C++ program, it is already in the proper machine code and is executed immediately.

Interesting note- All OS's and most device drivers are written in C for performance reasons.

why is this simple C++ addition 6 times slower than the equivalent Java?

Question

3 answers

solution1
7 ACCPTED 2014-07-20 06:56:46

solution2
0 2014-07-20 20:40:50

solution3
-1 2014-08-16 21:32:08

why is this simple C++ addition 6 times slower than the equivalent Java?

Question

3 answers

solution1 7 ACCPTED 2014-07-20 06:56:46

solution2 0 2014-07-20 20:40:50

solution3 -1 2014-08-16 21:32:08

solution1
7 ACCPTED 2014-07-20 06:56:46

solution2
0 2014-07-20 20:40:50

solution3
-1 2014-08-16 21:32:08