简体   繁体   English

Quicksort - 为什么我的荷兰旗实现比我的Hoare-2分区实现慢?

[英]Quicksort - why is my dutch-flag implementation slower than my Hoare-2-partition implementation?

As a learning exercise I'm implementing the Quicksort algorithm in C. Pivot is the median of 3 values, and for partitions with 4 or less elements I switch to Insertion Sort. 作为一个学习练习,我在C中实现Quicksort算法.Pivot是3个值的中值,对于4个或更少元素的分区,我切换到Insertion Sort。

Now I have been testing two variants: one uses Hoare's partition scheme , the other uses Dutch Flag . 现在我一直在测试两种变体:一种使用Hoare的分区方案 ,另一种使用荷兰旗

UPDATE : Included the whole file for both variants. 更新 :包括两个变体的整个文件。

Hoare's: 霍尔的:

#include <stdlib.h>
#include "quicksort.h"

#define THRESHOLD 4
#define SWAP(a, b)          \
{                           \
    char *a_swap = (a);     \
    char *b_swap = (b);     \
    int size_swap = size_q; \
    char tmp;               \
    while(size_swap-- > 0) {\
        tmp = *a_swap;      \
        *a_swap++ = *b_swap;\
        *b_swap++ = tmp;    \
    }                       \
}

#define MEDIAN_OF_3(left, mid, right)       \
{                                           \
    char *l = (left);                       \
    char *m = (mid);                        \
    char *r = (right);                      \
    if((*cmp_q)((void *)m, (void *)l) < 0) {\
        SWAP(m, l);                         \
    }                                       \
    if((*cmp_q)((void *)r, (void *)m) < 0) {\
        SWAP(r, m);                         \
    } else {                                \
        goto jump;                          \
    }                                       \
    if((*cmp_q)((void *)m, (void *)l) < 0) {\
        SWAP(m, l);                         \
    }                                       \
    jump:;                                  \
}

#define COPY(dest, src)             \
{                                   \
    char *src_copy = (src);         \
    char *dest_copy = (dest);       \
    size_t size_copy = size_q;      \
    while(size_copy-- > 0) {        \
        *dest_copy++ = *src_copy++; \
    }                               \
}

static size_t size_q = 0;
static char *e = NULL;
static int (*cmp_q)(const void *, const void *) = NULL;

void sort(char *left, char *right) {

    int elements = (right+size_q-left)/size_q;
    //========== QUICKSORT ==========
    if(elements > THRESHOLD) {

        //========== PIVOT = MEDIAN OF THREE ==========
        char *mid = left+size_q*((right-left)/size_q>>1);
        MEDIAN_OF_3(left, mid, right);
        char *pivot = mid;

        //========== PARTITIONING ==========
        char *left_part = left+size_q;
        char *right_part = right-size_q;
        while(left_part < right_part) {

            while((*cmp_q)((void *)left_part, (void *)pivot) < 0) {
                left_part += size_q;
            }

            while((*cmp_q)((void *)right_part, (void *)pivot) > 0) {
                right_part -= size_q;
            }

            if(left_part < right_part) {

                SWAP(left_part, right_part);

                if(pivot == left_part) {
                    pivot = right_part;
                } else if(pivot == right_part) {
                    pivot = left_part;
                }

                left_part += size_q;
                right_part -= size_q;
            }
        }

        //========== RECURSIVE CALLS ==========
        sort(left, right_part);
        sort(left_part, right);

    } else if(elements > 1) {

        //========== INSERTION SORT ==========
        char *i, *j;
        for(i = left+size_q; i <= right; i += size_q) {

            if((*cmp_q)((void *)i, (void *)(i-size_q)) < 0) {

                COPY(e, i);
                for(j = i-size_q; j >= left && (*cmp_q)((void *)e, (void *)j) < 0; j -= size_q) {
                    COPY(j+size_q, j);
                }
                COPY(j+size_q, e);
            }
        }
    }
}

void quicksort(void *array, size_t num, size_t size, int (*cmp)(const void *a, const void *b)) {
    char *array_q = (char *)array;
    size_q = size;
    cmp_q = cmp;
    e = malloc(size_q);
    sort(array_q, array_q+size_q*(num-1));
    free(e);
}

Dutch Flag: 荷兰国旗:

#include <stdlib.h>
#include "quicksort.h"

#define THRESHOLD 4
#define SWAP(a, b)          \
{                           \
    char *a_q = (a);        \
    char *b_q = (b);        \
    int size_swap = size_q; \
    char tmp;               \
    while(size_swap-- > 0) {\
        tmp = *a_q;         \
        *a_q++ = *b_q;      \
        *b_q++ = tmp;       \
    }                       \
                            \
}

#define MEDIAN_OF_3(left, mid, right)       \
{                                           \
    char *l = (left);                       \
    char *m = (mid);                        \
    char *r = (right);                      \
    if((*cmp_q)((void *)m, (void *)l) < 0) {\
        SWAP(m, l);                         \
    }                                       \
    if((*cmp_q)((void *)r, (void *)m) < 0) {\
        SWAP(r, m);                         \
    } else {                                \
        goto jump;                          \
    }                                       \
    if((*cmp_q)((void *)m, (void *)l) < 0) {\
        SWAP(m, l);                         \
    }                                       \
    jump:;                                  \
}

#define COPY(dest, src)             \
{                                   \
    char *src_copy = (src);         \
    char *dest_copy = (dest);       \
    size_t size_copy = size_q;      \
    while(size_copy-- > 0) {        \
        *dest_copy++ = *src_copy++; \
    }                               \
}

static size_t size_q = 0;
static char *pivot = NULL;
static char *e = NULL;
static int (*cmp_q)(const void *, const void *) = NULL;

void sort(char *left, char *right) {

    int elements = (right+size_q-left)/size_q;
    //========== QUICKSORT ==========
    if(elements > THRESHOLD) {

        //========== PIVOT = MEDIAN OF THREE ==========
        char *mid = left+size_q*((right-left)/size_q>>1);
        MEDIAN_OF_3(left, mid, right);
        COPY(pivot, mid);

        //========== 3-WAY PARTITIONING (DUTCH FLAG PROBLEM) ==========
        char *less = left;
        char *equal = left;
        char *greater = right;
        int value;
        while(equal <= greater) {
            value = (*cmp_q)((void *)equal, (void *)pivot);
            if(value < 0) {
                SWAP(less, equal);
                less += size_q;
                equal += size_q;
            } else if(value > 0) {
                SWAP(equal, greater);
                greater -= size_q;
            } else {
                equal += size_q;
            }
        }

        //========== RECURSIVE CALLS ==========
        sort(left, less-size_q);
        sort(greater+size_q, right);

    } else if(elements > 1) {

        //========== INSERTION SORT ==========
        char *i, *j;
        for(i = left+size_q; i <= right; i += size_q) {
            if((*cmp_q)((void *)i, (void *)(i-size_q)) < 0) {

                COPY(e, i);

                for(j = i-size_q; j >= left && (*cmp_q)((void *)e, (void *)j) < 0; j -= size_q) {
                    COPY(j+size_q, j);
                }

                COPY(j+size_q, e);
            }

        }
    }
}

void quicksort(void *array, size_t num, size_t size, int (*cmp)(const void *a, const void *b)) {
    char *array_q = (char *)array;
    size_q = size;
    cmp_q = cmp;
    pivot = malloc(size_q);
    e = malloc(size_q);
    sort(array_q, array_q+size_q*(num-1));
    free(pivot);
    free(e);
}

Both get the same input, a series of files, each of which contains 10^n random integer values with a range of [0:(10^n)+1] . 两者都获得相同的输入,一系列文件,每个文件包含10^n随机整数值,范围为[0:(10^n)+1] n ranges from 1 to 7 (10 to 10 million elements). n范围是1到7(1000到1000万个元素)。 I expected the Dutch Flag implementation to be at least as fast as Hoare's, but that was not the case. 我预计荷兰国旗的实施至少和Hoare一样快,但事实并非如此。

Flags: -O3 标志: -O3

Implementation    Size    Runs   Time
Hoare's           10^7    10     avg=2.148s
Dutch Flag        10^7    10     avg=3.312s

Then I changed the input: same size, 10^n , but with values [0:10^(n-1)] , which guaranteed lots of repeated values. 然后我改变了输入:相同大小, 10^n ,但值为[0:10^(n-1)] ,这保证了很多重复值。

Result: 结果:

Implementation    Size    Runs   Time
Hoare's           10^7    10     avg=0.170s
Dutch Flag        10^7    10     avg=0.260s

Even for repeated values Dutch Flag is slower than Hoare's. 即使是重复的值,荷兰国旗也比Hoare慢。 Why? 为什么? It does not seem likely that the chosen pivot is unique. 选择的枢轴似乎不可能是唯一的。

My environment, if it matters: 我的环境,如果重要的话:

CPU=Intel(R) Core(TM) i7-6820HK @ 2.70GHz
VM OS=Linux version 4.4.0-36-generic, Ubuntu 16.04.2, gcc version 5.4.0
Host=Microsoft Windows 10 Home
IDE=Eclipse CDT Neon
  1. Do not use malloc and free . 不要使用mallocfree They are used in each recursive call (total N times) and it takes a lot of time. 它们在每次递归调用中使用(总共N次),并且需要花费大量时间。

  2. Comparison will be more useful if you enable optimization ( -O3 ). 如果启用优化( -O3 ),比较将更有用。

  3. Is SWAP a macros or a function? SWAP是宏还是函数? If it is a function, try to make it inline . 如果它是一个函数,请尝试使其inline

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM