简体   繁体   中英

Sorting Network SWAPs for 64 elements

I am trying to use Sorting Network in a C program to sort a small list A of n elements. A Sorting Network consists of SWAP(x, y) macros, each of which compares two elements A[x] and A[y] , and swaps if necessary. This website generates the sequence of SWAP(x, y) macros for sorting n <= 32 elements.

Now, I am looking for the SWAP(x, y) sequence for sorting n = 64 elements. At this point, I am not sure if Sorting Network would be faster than using other sorting algorithms for n = 64 elements, but I wish to test it. My question is: is there any website/paper/project that lists this sequence? Or is there any algorithm to generate for n = 64 from the Sorting Networks for n <= 32 ?

Thanks.

In case anyone is interested (I was) in the question of how appropriate a sorting network is for 64 element sequences of 32 bit integers, I've just had a look myself and found the following:

  • qsort took about 2600ns per sequence
  • std::sort took about 1100 ns per sequence
  • a Bose-Nelson Sorting Network took about 1200 ns per sequence
  • a Batcher odd-even network took about 850ns per sequence
  • a Batcher odd-even network working on 8 sequences concurrently using AVX2 instructions took 70ns per sequence

The sequences were uniformly generated, so maximum entropy, ie worst case, favouring a sorting network.

You might expect a theoretical 8x speedup using AVX2, why is there a 12x speedup? Looking at the assembly, Clang performs multiple swaps of the sorting network in blocks like:

00007FF6DA081374  vpminsd     ymm4,ymm0,ymm1  
00007FF6DA081379  vpmaxsd     ymm0,ymm0,ymm1  
00007FF6DA08137E  vpminsd     ymm1,ymm2,ymm3  
00007FF6DA081383  vpmaxsd     ymm2,ymm2,ymm3  
00007FF6DA081388  vpminsd     ymm3,ymm4,ymm1  
00007FF6DA08138D  vpmaxsd     ymm1,ymm4,ymm1  
00007FF6DA081392  vpminsd     ymm4,ymm0,ymm2  
00007FF6DA081397  vpmaxsd     ymm0,ymm0,ymm2  
00007FF6DA08139C  vpminsd     ymm2,ymm4,ymm1  
00007FF6DA0813A1  vpmaxsd     ymm1,ymm4,ymm1 

whereas the scalar code uses cmp, cmovgt, cmovlt instructions intermingled with mov's too and from memory. Make of that what you will.

I used my own implementation and benchmarking code for the Batcher odd/even network available at https://github.com/jamesthomasgriffin/sorting_networks and, for the Bose-Nelson network, https://github.com/Vectorized/Static-Sort .

The method that gives the best results for 64 inputs is probably the one described by David C. Van Voorhis. See link below for a similar.network:

https://bertdobbelaere.github.io/sorting.networks_extended.html#N64L521D22

This is related to shifting a circular array (Approach #3 in https://leetcode.com/articles/rotate-array/# )

There are algorithms to determine the sequence ie Bose-Nelson algorithm ( https://metacpan.org/pod/Algorithm::Networksort ), a C implementation is in https://github.com/atinm/bose-nelson/blob/master/bose-nelson.c

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM