简体   繁体   中英

Performance optimization: C++ vs Java not performing as expected

I have written two programs implementing a simple algorithm for matrix multiplication, one in C++ and one in Java. Contrary to my expectations, the Java program runs about 2.5x faster than the C++ program. I am a novice at C++, and would like suggestions on what I can change in the C++ program to make it run faster.

My programs borrow code and data from this blog post http://martin-thoma.com/matrix-multiplication-python-java-cpp .

Here are the current compilation flags I am using:

g++ -O3 main.cc    

javac Main.java

Here are the current compiler/runtime versions:

$ g++ --version
g++.exe (GCC) 4.8.1
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ java -version
java version "1.8.0_05"
Java(TM) SE Runtime Environment (build 1.8.0_05-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.5-b02, mixed mode)

My computer is a ~2012 era core i3 laptop running windows with MinGW. Here are the current performance results:

$ time ./a.exe < ../Testing/2000.in
507584919
real    0m36.469s
user    0m0.031s
sys     0m0.030s

$ time java Main < ../Testing/2000.in
507584919
real    0m14.299s
user    0m0.031s
sys     0m0.015s

Here is the C++ program:

#include <iostream>
#include <cstdio>
using namespace std;

int *A;
int *B;
int height;
int width;

int * matMult(int A[], int B[]) {
        int * C = new int[height*width];
        int n = height;
        for (int i = 0; i < n; i++) {
            for (int k = 0; k < n; k++) {
                for (int j = 0; j < n; j++) {
                    C[width*i+j]+=A[width*i+k] * B[width*k+j];
                }
            }
        }
        return C;
}

int main() {
  std::ios::sync_with_stdio(false);
  cin >> height;
  cin >> width;
  A = new int[width*height];
  B = new int[width*height];
  for (int i = 0; i < width*height; i++) {
    cin >> A[i];
  }

  for (int i = 0; i < width*height; i++) {
    cin >> B[i];
  }

  int *result = matMult(A,B);
  cout << result[2];
}

Here is the java program:

import java.util.*;
import java.io.*;

public class Main {

    static int[] A;
    static int[] B;
    static int height;
    static int width;

public static void main(String[] args) {
    try {
        BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));
        height = Integer.parseInt(reader.readLine());
        width = Integer.parseInt(reader.readLine());
        A=new int[width*height];
        B=new int[width*height];
        int index = 0;

        String thisLine;
        while ((thisLine = reader.readLine()) != null) {
            if (thisLine.trim().equals("")) {
                break;
            } else {
                String[] lineArray = thisLine.split("\t");
                for (String number : lineArray) {
                    A[index] = Integer.parseInt(number);
                    index++;
                }
            }
        }

        index = 0;
        while ((thisLine = reader.readLine()) != null) {
            if (thisLine.trim().equals("")) {
                break;
            } else {
                String[] lineArray = thisLine.split("\t");
                for (String number : lineArray) {
                    B[index] = Integer.parseInt(number);
                    index++;
                }
            }
        }

        int[] result = matMult(A,B);
        System.out.println(result[2]);

        reader.close();


    } catch (Exception e) {
        e.printStackTrace();
    }
}

public static int[] matMult(int[] A, int[] B) {
        int[] C = new int[height*width];
        int n = height;
        for (int i = 0; i < n; i++) {
            for (int k = 0; k < n; k++) {
                for (int j = 0; j < n; j++) {
                    C[width*i+j]+=A[width*i+k] * B[width*k+j];
                }
            }
        }
        return C;
    }
}

Here is a link to a 2000x2000 test case: https://mega.nz/#!sglWxZqb!HBts_UlZnR4X9gZR7bG-ej3xf2A5vUv0wTDUW-kqFMA

Here is a link to a 2x2 test case: https://mega.nz/#!QwkV2SII!AtfGuxPV5bQeZtt9eHNNn36rnV4sGq0_sJzitjiFE8s

Any advice explaining what I am doing wrong in C++, or why my C++ implementation is running so much slower than Java here, would be much appreciated!

EDIT: As suggested, I modified the programs so that they do not actually perform a multiplication, but just read the arrays in and print out one number from each. Here are the performance results for that. The C++ program has slower IO. That only accounts for part of the difference however.

$ time ./IOonly.exe < ../Testing/2000.in
7
944
real    0m8.158s
user    0m0.000s
sys     0m0.046s

$ time java IOOnly < ../Testing/2000.in
7
944
real    0m1.461s
user    0m0.000s
sys     0m0.047s

I'm not able to analyze the java execution, since it creates a temporary executable module that disappears after it's been "used". However, I assume that it does execute SSE instructions to get that speed [or that it unrolls the loop, which clang++ does if you disable SSE instructions]

But compiling with g++ (4.9.2) and clang++, I can clearly see that clang optimises the loop to use SSE instructions, where gcc doesn't. The resulting code is thus exactly 4 times slower. Changing the code so that it uses a constant value of 2000 in each dimension [so compiler "knows" the dimensions of the height and width], the gcc compiler also generates code that takes around 8s (on my machine!), compared to 27s with "variable" value [the clang compiled code is marginally faster as well here, but within the noise I'd say].

Overall conclusion: Quality/cleverness of compiler will highly affect the performance of tight loops. The more complex and varied the code is, the more likely it is that the C++ solution will generate better code, where simple and easy to compile problems are quite likely to be better in Java code [as a rule, but not guaranteed]. I expect the java compiler uses profiling to determine the number of loops for example.

Edit:

The result of time can be used to determine if the reading of the file is taking a long time, but you need some kind of profiling tool to determine if the actual input is using a lot of CPU-time and such.

The java engine uses a "just-in-time compiler", which uses profiling to determine the number of times a particular piece of code is hit (you can do that for C++ too, and big projects often do!), which allows it to for example unroll a loop, or determine at runtime the number of iterations in a loop. Given that this code does 2000 * 2000 * 2000 loops, and the C++ compiler actually does a BETTER job when it KNOWS the size of the values is telling us that the Java runtime isn't actually doing better (at least not initially), just that it manages to improve the performance over time.

Unfortunately, due to the way that the java runtime works, it doesn't leave the binary code behind, so I can't really analyze what it does.

The key here is that the actual operations you are doing are simple, and the logic is simple, it's just an awful lot of them, and you are doing them using a trivial implementation. Both Java and C++ will benefit from manually unrolling the loop, for example.

C++ is not faster than Java by default

C++ is fast as a language, but soon as you incorporate libraries into the mix, you are bound to these libraries' speed.

The standard is hardly built for performance, period. The standard libraries are written with design and correctness in mind.

C++ gives you the opportunity to optimize!
If you are unhappy with the standard library's performance, you can, and you should, use your own optimized version.

For example, standard C++ IO objects are beautiful when it comes to design (stream, locales, facets, inner buffers) but that makes them terrible at performance. If you are writing for Windows OS, you can use ReadFile and WriteConsole as your mechanism for IO.
If you switch to these functions instead of the standard libraries - your program outperforms Java by a few orders of magnitude.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM