简体   繁体   中英

Java Arrays.sort performance for primitive types and objects

I read a few threads here about Arrays.sort using "tuned quick-sort" for primitive types and merge-sort for objects. I did a small test just to prove that but I found is quiet the opposite.

            int a[] = new int[50000];
    //Integer a[] = new Integer[50000];

    for(int i=0; i<50000; i++) {
        //a[i] = new Integer(new Random().nextInt(5000)); 
        a[i] = new Random().nextInt(5000);
    }
    System.out.println(System.currentTimeMillis());

    Arrays.sort(a);

    System.out.println(System.currentTimeMillis());

For primitive type array it took 22ms where as for array with objects it took 98ms. My laptop it i7 with 8 cores and 8GB of RAM. Have I run it incorrectly?

Many Thanks!

This is not surprising to me at all.

First, you have primitives vs. the indirection of needing to chase references down, the comparisons between two primitives will be faster, etc.

Second, a primitive array will play extremely nicely with the CPU cache. A non-primitive array will not necessarily because there is no guarantee that the referenced objects are contiguous in memory (unlikely) and, additionally, the referrent objects are larger which means that less of them can fit in cache at any one time.

See, in both cases, the values in the arrays will fit in the cache, but the problem with the Integer[] is that you still have to leave the cache and hit the memory bus to chase down the references and find them in main memory; those references could be pointing all over the place on the heap. This is going to make the poor CPU just wait and wait as now cache misses become much more likely.

That is, you have this array of primitives like this

  _   _   _   _       _
 |5| |7| |2| |1| ... |4|

and these all sit next to each other in memory. When one value is pulled into cache from memory, the neighbors get pulled into the cache too. Quicksort and mergesort operate on contiguous sections of the array, so they benefit very much from the CPU cache being nice here (this is locality of reference )

But when you have an array of Integer like this

           _               _
     |--->|7|     ______> |1| 
 _   |   _       |   _
| | |_| | | ... |_| | |         _
 |     _ |_____      |________>|4|
 |___>|5|      |    _           
               |__>|2|

the storage locations for the references are contiguous in memory, so they play nicely with the cache. The problem is the *indirection, the possibility of the referrent Integer objects being fragmented in memory and the fact that less of them will fit in the cache. This extra indirection, the fragmentation, and the size issue is what will not play nicely with the cache.

Again, for something like quicksort or mergesort which plays on contiguous sections of the array, this is huge, Huge, HUGE and almost surely accounts for the vast majority of the performance difference.

Have I run it incorrectly?

Yes, please use System.nanoTime the next time that you need to do a benchmark. System.currentTimeMillis has terrible resolution and is not good for benchmarking.

Your int[] fits in your L2 cache. It is about 4 B * 50K which is 200 KB and your L2 cache is 256 KB. This will run much faster than your Object[] which will be in your L3 cache as it is about 28 B * 50K or 1400 KB in size.

The L2 cache (~11 clock cycles) is about 4-6x faster than your L3 cache (~45 - 75 clock cycles)

I bet if you ran this more than once you would get a better result as the code warms up.

public static void test_int_array() {
    int a[] = new int[50000];
    //Integer a[] = new Integer[50000];

    Random random = new Random();
    for (int i = 0; i < 50000; i++) {
        //a[i] = new Integer(new Random().nextInt(5000));
        a[i] = random.nextInt(5000);
    }
    long start = System.nanoTime();
    Arrays.sort(a);
    long time = System.nanoTime() - start;
    System.out.printf("int[] sort took %.1f ms%n", time / 1e6);
}

public static void test_Integer_array() {
    Integer a[] = new Integer[50000];

    Random random = new Random();
    for (int i = 0; i < 50000; i++) {
        a[i] = random.nextInt(5000);
    }
    long start = System.nanoTime();
    Arrays.sort(a);
    long time = System.nanoTime() - start;
    System.out.printf("Integer[] sort took %.1f ms%n", time / 1e6);
}

public static void main(String... ignored) {
    for (int i = 0; i < 10; i++) {
        if (test_int_array()[0] > 0) throw new AssertionError();
        if (test_Integer_array()[0] > 0) throw new AssertionError();
    }
}

prints

int[] sort took 32.1 ms
Integer[] sort took 104.1 ms
int[] sort took 4.0 ms
Integer[] sort took 83.8 ms
int[] sort took 33.4 ms
Integer[] sort took 76.7 ms
int[] sort took 4.4 ms
Integer[] sort took 40.5 ms
int[] sort took 3.8 ms
Integer[] sort took 17.4 ms
int[] sort took 4.7 ms
Integer[] sort took 22.4 ms
int[] sort took 4.4 ms
Integer[] sort took 12.1 ms
int[] sort took 3.7 ms
Integer[] sort took 11.2 ms
int[] sort took 3.9 ms
Integer[] sort took 10.7 ms
int[] sort took 3.6 ms
Integer[] sort took 11.9 ms

You can see how much difference warming up the code can make.

Have I run it incorrectly?

Your benchmarking is quite primitive and it doesn't really establish anything. How does sorting time grow with array size for each case? How much of the difference between primitive sorting and object sorting can be attributed to the different costs of comparing primitives vs. comparing objects? (This would be independent of the performance of the sorting algorithm, but would be attributed to the sorting algorithm by your test.)

As others have noted, if you are timing things that take on the order of tens of milliseconds, you should be using System.nanoTime ; System.currentTimeMillis often has a resolution no better than 10 ms. Simply switching your timing technology, however, won't solve the more serious problems with your tests.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM