简体   繁体   English

堆排序的比较数

[英]Number of Comparisons for Heap Sort

I wrote some C code to analyze the number of comparisons and runtime of building a heap and running heapsort. 我编写了一些C代码来分析比较次数以及构建堆和运行堆排序的运行时间。 However, I'm not sure if the output of my code makes sense. 但是,我不确定代码的输出是否有意义。 Heapsort should perform at O(n log n), but the number of comparisons I'm seeing doesn't seem to be very close to that. 堆排序应该在O(n log n)上执行,但是我看到的比较次数似乎并不十分接近。 For example, for an input of size n = 100, I'm seeing ~200 comparisons to build the heap and ~800 comparisons in heap sort. 例如,对于大小为n = 100的输入,我看到约200个比较来构建堆,而我看到了约800个堆排序比较。 Am I just analyzing the data wrong, or is there something wrong with the way I'm collecting comparisons in my code? 我只是在分析数据错误,还是在我的代码中收集比较的方式有问题?

I can provide a link to github if it would make a difference for anyone. 我可以提供指向github的链接,如果它对任何人都有帮助的话。

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>

void bottom_up_heap_sort(int*, int);
void heap_sort(int*, int);
void sift_up(int*, int);
void sift_down(int*, int);
void build_max_heap(int*, int); 
void bottom_up_build_max_heap(int*, int);
void randomize_in_place(int*, int);
int* generate_array(int);
void swap(int*, int*);
int cmp(int, int);
void print_array(int*, int);

int heapsize;
unsigned long comparison_counter;
clock_t begin, end;
double time_spent;

int main() {
    int k, N;
    int* A;
    int* B;
    int i;

    printf("Testing Sift_Down Heap Sort\n");
    for(k = 2; k <= 5; k++) {
        comparison_counter = 0;
        N = (int)pow((double)10, k);

        begin = clock();
        A = generate_array(N);
        end = clock();
        time_spent = (double)(end - begin) / CLOCKS_PER_SEC;
        printf("Time Spent Generating Array: %f\n", time_spent);

        // print the first unsorted array
        //printf("Unsorted Array:\n");
        //print_array(A, N);

        begin = clock();
        // call heap_sort on the first unsorted array
        heap_sort(A, N);
        end = clock();
        time_spent = (double)(end - begin) / CLOCKS_PER_SEC;

        // show that the array is now sorted
        //printf("Sorted array: \n");
        //print_array(A, N);
        printf("Done with k = %d\n", k);
        printf("Comparisons for Heap Sort: %lu\n", comparison_counter);
        printf("Time Spent on Heap Sort: %f\n", time_spent);
        printf("\n");
    }

    printf("----------------------------------\n");
    printf("Testing Sift_Up Heap Sort\n");
        for(k = 2; k <= 5; k++) {
        comparison_counter = 0;
                N = (int)pow((double)10, k);

        begin = clock();
                B = generate_array(N);
        end = clock();
        time_spent = (double)(end - begin) / CLOCKS_PER_SEC;
        printf("Time Spent Generating Array: %f\n", time_spent);

                // print the unsorted array
                //printf("Unsorted Array:\n");
                //print_array(B, N);

        begin = clock();
                // call heap_sort on the unsorted array
                bottom_up_heap_sort(B, N);
        end = clock();
        time_spent = (double)(end - begin) / CLOCKS_PER_SEC;

                // show that the array is now sorted
                //printf("Sorted array: \n");
                //print_array(B, N);
                printf("Done with k = %d\n", k);
        printf("Comparisons for Heap Sort: %lu\n", comparison_counter);
        printf("Time Spent on Heap Sort: %f\n", time_spent);
        printf("\n");
        }

    printf("----------------------------------\n");

    return 0;
}

void bottom_up_heap_sort(int* arr, int len) {
    int i;

    // build a max heap from the bottom up using sift up
    bottom_up_build_max_heap(arr, len);
    printf("Comparisons for heap construction: %lu\n", comparison_counter);
    comparison_counter = 0; 
    for(i = len-1; i >= 0; i--) {
        // swap the last leaf and the root
        swap(&arr[i], &arr[0]);
        // remove the already sorted values
        len--;
        // repair the heap
        bottom_up_build_max_heap(arr, len);
    }
}

void heap_sort(int* arr, int len) {
    int i;

    // build a max heap from the array
    build_max_heap(arr, len);
    printf("Comparisons for heap construction: %lu\n", comparison_counter);
    comparison_counter = 0;
    for(i = len-1; i >= 1; i--) {
        swap(&arr[0], &arr[i]); // move arr[0] to its sorted place
        // remove the already sorted values
        heapsize--;
        sift_down(arr, 0);  // repair the heap
    }
}

void sift_down(int* arr, int i) {
    int c = 2*i+1;
    int largest;

    if(c >= heapsize) return;

    // locate largest child of i
    if((c+1 < heapsize) && cmp(arr[c+1], arr[c]) > 0) {
        c++;
    }

    // if child is larger than i, swap them
    if(cmp(arr[c], arr[i]) > 0) {
        swap(&arr[c], &arr[i]);
        sift_down(arr, c);
    }
}

void sift_up(int* arr, int i) {
    if(i == 0) return; // at the root

    // if the current node is larger than its parent, swap them
    if(cmp(arr[i], arr[(i-1)/2]) > 0) {
        swap(&arr[i], &arr[(i-1)/2]);
        // sift up to repair the heap
        sift_up(arr, (i-1)/2);
    }
}

void bottom_up_build_max_heap(int* arr, int len) {
    int i;
    for(i = 0; i < len; i++) {
        sift_up(arr, i);
    }
}

void build_max_heap(int* arr, int len) {
    heapsize = len;
    int i;
    for(i = len/2; i >= 0; i--) {
        // invariant: arr[k], i < k <= n are roots of proper heaps
        sift_down(arr, i);
    }
}

void randomize_in_place(int* arr, int n) {
    int j, k;
    double val;
    time_t t;
    // init the random number generator
    srand((unsigned)time(&t));

    // randomization code from class notes
    for(j = 0; j < n-1; j++) {
        val = ((double)random()) / 0x7FFFFFFF;
        k = j + val*(n-j);
        swap(&arr[k], &arr[j]);
    }
}

// this function is responsible for creating and populating an array 
// of size k, and randomizing the locations of its elements
int* generate_array(int k) {
    int* arr = (int*) malloc(sizeof(int)*k-1);
    int i, j, x, N;
    double val;
    time_t t;
    // init the random number generator
    srand((unsigned)time(&t));

    // fill the array with values from 1..N
    for(i = 0; i <= k-1; i++) {
        arr[i] = i+1;
    }

    N = (int)pow((double)10, 5);
    // randomize the elements of the array for 10^5 iterations
    for(i = 0; i < N; i++) {
        randomize_in_place(arr, k);
    }

    return arr;
}

// swap two elements
void swap(int* a, int* b) {
    int temp = *a;
    *a = *b;
    *b = temp;
}

int cmp(int a, int b) {
    comparison_counter++;

    if(a > b) return 1;
    else if(a < b) return -1;
    else return 0;
}

// print out an array by iterating through
void print_array(int* arr, int size) {
    int i;
    for(i = 0; i < size; i++) {
        printf("%d ", arr[i]);
    }
}

The actual number for such small values of n doesn't really matter, as the constant factors are omitted in the complexity. 这样小的n值的实际数目并不重要,因为在复杂度中省略了常数因子。 What matters is the growth of your algorithm, measuring for increasingly larger values of n, and plotting them should give roughly the same graph as your theoretical complexity. 重要的是算法的增长,测量n的值越来越大,然后对其进行绘制,所得到的图形应与理论复杂度大致相同。

I tried your code for a couple of n, and the increase in complexity was approximately O(n logn ) 我尝试了几次您的代码,复杂度增加了大约O(n logn)

O(n log n) (or in general O(f(x)) ) does not give you any idea about the expected value at a single point. O(n log n) (或者通常是O(f(x)) )不会让您对单点的期望值有任何了解。

That's because big-O notation ignores constant factors. 那是因为big-O表示法会忽略常数因素。 In other words, all of n * log(n) , 0.000001 * n * log(n) and 1000000 * n * log(n) are in O(n log n) . 换句话说,所有n * log(n)0.000001 * n * log(n)1000000 * n * log(n)都在O(n log n) So the result for a particular value of n is completely undetermined. 因此,对于特定n值的结果是完全不确定的。

What you can deduce from big-O notation is the effect of modify the control variable. 从big-O表示法可以推断出的是修改控制变量的效果。 If a function involves O(n) operations, then it is expected that doubling n will double the number of operations. 如果一个函数涉及O(n)运算,那么将n加倍将使运算数量加倍。 If a function involves O(n 2 ) operations, then it is expected that doubling n will quadruple the number of operations. 如果一个函数涉及O(n 2 )运算,那么将n加倍将使运算数量增加4倍。 And so on. 等等。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM