简体   繁体   中英

Parallelized Small Network sort

I was working on network sort (for arrays smaller than 8) and noticed that all the algorithms focus on its ability to allow parallel operations. Here is one such set for an array of size 5.

 #define SWAP(x,y) if (data[y] < data[x]) { int tmp = data[x]; data[x] = data[y]; data[y] = tmp; }

    //Parallelizable
    SWAP(1, 2);
    SWAP(4, 5);

    //Parallelizable
    SWAP(0, 2);
    SWAP(3, 5);

    //Parallelizable
    SWAP(0, 1);
    SWAP(3, 4);
    SWAP(2, 5);

    //Parallelizable
    SWAP(0, 3);
    SWAP(1, 4);

    //Parallelizable
    SWAP(2, 4);
    SWAP(1, 3);

    //Parallelizable
    SWAP(2, 3);

I was working with long int arrays (So each element is 8 bytes in size). So is there any easy way to parallelize these operations in C ? Is there any hardware specific commands I can use to achieve this (SIMD, ASM(x86) etc.)

As explained by this answer to a question about sorting small collections, you can actually make your swap code more performant by changing its definition to the following one:

#define SWAP(x, y) {                        \
    int dx = data[x];                       \
    data[x] = dx < data[y] ? dx : data[y];  \
    data[y] ^= dx ^ data[x];                \
}

According to the research paper Applying Sorting Networks to Synthesize Optimized Sorting Libraries , this version of SWAP is branchless and compiles down to a mere 5 instructions on GCC or Clang with a decent optimization level. The article also hints at the fact that the low number of instructions may actually make the code benefit from instruction-level parallelism.

If xor does not work for the types to be sort, you can use an alternative version of SWAP that uses two conditionals instead of one, which should be almost as fast as the xor version. Actually, I use this trick in a sorting library of mine and sorting a small fixed-size collection of integers with sorting networks went from « not really better than insertion sort » to « several times faster than insertion sort » when I introduced the trick. Sorting a collection of 8 integers is ~5 times faster with sorting networks than with an insertion sort on my computer.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM