C++ and Java performance

Question

this question is just speculative.

I have the following implementation in C++:

using namespace std;

void testvector(int x)
{
  vector<string> v;
  char aux[20];
  int a = x * 2000;
  int z = a + 2000;
  string s("X-");
  for (int i = a; i < z; i++)
  {
    sprintf(aux, "%d", i);
    v.push_back(s + aux);
  }
}

int main()
{
  for (int i = 0; i < 10000; i++)
  {
    if (i % 1000 == 0) cout << i << endl;
    testvector(i);
  }
}

In my box, this program gets executed in approx. 12 seconds; amazingly, I have a similar implementation in Java [using String and ArrayList] and it runs lot faster than my C++ application (approx. 2 seconds).

I know the Java HotSpot performs a lot of optimizations when translating to native, but I think if such performance can be done in Java, it could be implemented in C++ too...

So, what do you think that should be modified in the program above or, I dunno, in the libraries used or in the memory allocator to reach similar performances in this stuff? (writing actual code of these things can be very long, so, discussing about it would be great)...

Thank you.

Answer 1

You have to be careful with performance tests because it's very easy to deceive yourself or not compare like with like.

However, I've seen similar results comparing C# with C++, and there are a number of well-known blog posts about the astonishment of native coders when confronted with this kind of evidence. Basically a good modern generational compacting GC is very much more optimised for lots of small allocations.

In C++'s default allocator, every block is treated the same, and so are averagely expensive to allocate and free. In a generational GC, all blocks are very, very cheap to allocate (nearly as cheap as stack allocation) and if they turn out to be short-lived then they are also very cheap to clean up.

This is why the "fast performance" of C++ compared with more modern languages is - for the most part - mythical. You have to hand tune your C++ program out of all recognition before it can compete with the performance of an equivalent naively written C# or Java program.

Answer 2

All your program does is print the numbers 0..9000 in steps of 1000. The calls to testvector() do nothing and can be eliminated. I suspect that your JVM notices this, and is essentially optimising the whole function away.

You can achieve a similar effect in your C++ version by just commenting out the call to testvector() !

Answer 3

Well, this is a pretty useless test that only measures allocation of small objects. That said, simple changes made me get the running time down from about 15 secs to about 4 secs. New version:

typedef vector<string, boost::pool_allocator<string> > str_vector;    

void testvector(int x, str_vector::iterator it, str_vector::iterator end)
{
    char aux[25] = "X-";
    int a = x * 2000;
    for (; it != end; ++a)
    {
        sprintf(aux+2, "%d", a);
        *it++ = aux;
    }
}

int main(int argc, char** argv)
{
    str_vector v(2000);
    for (int i = 0; i < 10000; i++)
    {
        if (i % 1000 == 0) cout << i << endl;
        testvector(i, v.begin(), v.begin()+2000);
    }
    return 0;
}

real    0m4.089s
user    0m3.686s
sys     0m0.000s

Java version has the times:

real    0m2.923s
user    0m2.490s
sys     0m0.063s

(This is my direct java port of your original program, except it passes the ArrayList as a parameter to cut down on useless allocations).

So, to sum up, small allocations are faster on java, and memory management is a bit more hassle in C++. But we knew that already :)

Answer 4

Hotspot optimises hot spots in code. Typically, anything that gets executed 10000 times it tries to optimise.

For this code, after 5 iterations it will try and optimise the inner loop adding the strings to the vector. The optimisation it will do more than likely will include escape analyi o the variables in the method. A the vector is a local variable and never escapes local context, it is very likely that it will remove all of the code in the method and turn it into a no op. To test this, try returning the results from the method. Even then, be careful to do something meaningful with the result - just getting it's length for example can be optimised as horpsot can see the result is alway the same as the number of iterations in the loop.

All of this points to the key benefit of a dynamic compiler like hotspot - using runtime analysis you can optimise what is actually being done at runtime and get rid of redundant code. After all, it doesn't matter how efficient your custom C++ memory allocator is - not executing any code is always going to be faster.

Answer 5

In my box, this program gets executed in approx. 12 seconds; amazingly, I have a similar implementation in Java [using String and ArrayList] and it runs lot faster than my C++ application (approx. 2 seconds).

I cannot reproduce that result.

To account for the optimization mentioned by Alex, I've modified the codes so that both the Java and the C++ code printed the last result of the v vector at the end of the testvector method.

Now, the C++ code (compiled with -O3 ) runs about as fast as yours (12 sec). The Java code (straightforward, uses ArrayList instead of Vector although I doubt that this would impact the performance, thanks to escape analysis) takes about twice that time.

I did not do a lot of testing so this result is by no means significant. It just shows how easy it is to get these tests completely wrong, and how little single tests can say about real performance.

Just for the record, the tests were run on the following configuration:

$ uname -ms
Darwin i386
$ java -version
java version "1.6.0_15"
Java(TM) SE Runtime Environment (build 1.6.0_15-b03-226)
Java HotSpot(TM) 64-Bit Server VM (build 14.1-b02-92, mixed mode)
$ g++ --version
i686-apple-darwin9-g++-4.0.1 (GCC) 4.0.1 (Apple Inc. build 5490)

Answer 6

如果你使用Vector::reserve在循环之前为v z元素保留空间应该会有所帮助（但是同样的事情也应该加速这段代码的java等价物）。

Answer 7

为了说明为什么C ++和java的性能不同，看两者的来源都很重要，我可以在C ++中看到一些性能问题，对于一些人来说，看看你是否在java中做同样的事情会很有用（例如通过std :: endl刷新输出流，你是否调用System.out.flush（）或只是附加一个'\\ n'，如果后来你刚刚给了java一个明显的优势）？

Answer 8

What are you actually trying to measure here? Putting ints into a vector?

You can start by pre-allocating space into the vector with the know size of the vector:

instead of:

void testvector(int x)
{
  vector<string> v;
  int a = x * 2000;
  int z = a + 2000;
  string s("X-");
  for (int i = a; i < z; i++)
    v.push_back(i);
}

try:

void testvector(int x)
{
  int a = x * 2000;
  int z = a + 2000;
  string s("X-");
  vector<string> v(z);
  for (int i = a; i < z; i++)
    v.push_back(i);
}

Answer 9

In your inner loop, you are pushing ints into a string vector. If you just single-step that at the machine-code level, I'll bet you find that a lot of that time goes into allocating and formatting the strings, and then some time goes into the pushback (not to mention deallocation when you release the vector).

This could easily vary between run-time-library implementations, based on the developer's sense of what people would reasonably want to do.

C++ and Java performance

Question

9 answers

solution1
12 ACCPTED 2009-10-11 15:14:58

solution2
6 2009-10-11 17:05:37

solution3
5 2009-10-11 17:54:32

solution4
4 2009-10-11 17:05:23

solution5
3 2009-10-11 17:48:22

solution6
1 2009-10-11 15:14:40

solution7
1 2009-10-11 16:29:13

solution8
0 2009-10-11 15:18:31

solution9
0 2009-10-11 15:40:42

C++ and Java performance

Question

9 answers

solution1 12 ACCPTED 2009-10-11 15:14:58

solution2 6 2009-10-11 17:05:37

solution3 5 2009-10-11 17:54:32

solution4 4 2009-10-11 17:05:23

solution5 3 2009-10-11 17:48:22

solution6 1 2009-10-11 15:14:40

solution7 1 2009-10-11 16:29:13

solution8 0 2009-10-11 15:18:31

solution9 0 2009-10-11 15:40:42

solution1
12 ACCPTED 2009-10-11 15:14:58

solution2
6 2009-10-11 17:05:37

solution3
5 2009-10-11 17:54:32

solution4
4 2009-10-11 17:05:23

solution5
3 2009-10-11 17:48:22

solution6
1 2009-10-11 15:14:40

solution7
1 2009-10-11 16:29:13

solution8
0 2009-10-11 15:18:31

solution9
0 2009-10-11 15:40:42