简体   繁体   中英

CRLS merge sort boundary code understanding in C code

void merge(int A[], int p, int q, int r) {
    int *tmpL, *tmpR;
    int boundary;
    int n1, n2;
    int i, j, k;

    n1 = q - p + 1;
    n2 = r - q;

    tmpL = (int *)malloc(sizeof(int) * (n1 + 1));
    tmpR = (int *)malloc(sizeof(int) * (n2 + 1));

    for (i = 0; i < n1; i++)
        tmpL[i] = A[p + i];
    for (j = 0; j < n2; j++)
        tmpR[j] = A[q + j + 1];

    boundary = tmpL[n1 - 1] > tmpR[n2 - 1] ? tmpL[n1 - 1] + 1 : tmpR[n2 - 1] + 1;
    tmpL[n1] = boundary;
    tmpR[n2] = boundary;

    i = 0;
    j = 0;

    for (k = p; k <= r; k++) {
        if (tmpL[i] <= tmpR[j]) {
            A[k] = tmpL[i];
            i++;
        } else {
            A[k] = tmpR[j];
            j++;
        }
    }

    free(tmpL);
    free(tmpR);
}
void merge_sort(int A[], int p, int r) {
    int q;

    if (p < r) {
        q = (p + r) / 2;
        merge_sort(A, p, q);
        merge_sort(A, q + 1, r);
        merge(A, p, q, r);
    }
}

I could not understand this infinite boundary code exactly boundary = tmpL[n1 - 1] > tmpR[n2 - 1] ? tmpL[n1 - 1] + 1 : tmpR[n2 - 1] + 1; boundary = tmpL[n1 - 1] > tmpR[n2 - 1] ? tmpL[n1 - 1] + 1 : tmpR[n2 - 1] + 1;

Thanks https://i.stack.imgur.com/UmyUg.png (circled in blue)

This is a conditional statement, A> B? C:D A> B? C:D . If A> B is true then evaluate C, else evaluate D. But I still do not understand the boundary part. Is this the same as adding two while loops to deal with when one half of them have remaining elements and append them to the end of new arrays?

If I don't initialize them as infinite boundary they may give me a segmentation fault.

merge() is supposed to merge two already sorted runs in A, from A[p] to A[q], and from A[q+1] to A[r] (inclusive). TmpL and TmpR are created, each with space for 1 extra element at the end to be used as a sentinel value that is greater than any value in either TmpL or TmpR. The ternary statement sets boundary to the greater of the last values in TmpL and TmpR, then adds 1 to this value to create a sentinel value that is stored at the end of TmpL and TmpR. This eliminates the need to check the indexes "i" or "j" to see if the end of TmpL or TmpR has been reached, in which case the rest of TmpR or TmpL would be copied back to A[].

For most programming languages, instead of using the ternary statement, the code could have just set boundary to INT_MAX or one of the other max values from the include file limits.h (or for C++, climits):

http://www.cplusplus.com/reference/climits

If sorting floats or doubles, boundary can be set to infinity.

The reason for the segmentation fault is that without the sentinel value, the code may run beyond the end of either TmpL or TmpR causing the fault.

A problem with this method for sorting is that A[] may already contain the maximum possible value, in which case this approach will fail. In the case of integers, adding 1 to the maximum value will wrap around to the smallest value.

The code uses a common approach for mergesort where copies are made of both subarrays with an extra element at the end, set to a value greater than the maximum value of both arrays.

The statement boundary = tmpL[n1 - 1] > tmpR[n2 - 1] ? tmpL[n1 - 1] + 1 : tmpR[n2 - 1] + 1; boundary = tmpL[n1 - 1] > tmpR[n2 - 1] ? tmpL[n1 - 1] + 1 : tmpR[n2 - 1] + 1; attempts to compute the value boundary as 1 plus the maximum value of tmpL or tmpR depending on which is greater. It uses a ternary expression which is roughly equivalent to writing:

    if (tmpL[n1 - 1] > tmpR[n2 - 1])
        boundary = tmpL[n1 - 1] + 1;
    else
        boundary = tmpR[n2 - 1] + 1;

The merging loop can then use a single test k <= r to stop the loop and i will equal n1 and j will equal n2 when k reaches r + 1 .

This approach is broken in many respects:

  • if either subarray contains the maximum value INT_MAX , the computation of boundary will overflow and cause undefined behavior. Even if the overflow does not cause a fatal side-effect, the value of boundary will be meaningless, causing incorrect results or other undefined behavior.
  • testing for array boundaries is simple, much simpler than this incomplete work-around.
  • this method requires both arrays to be allocated and copied, whereas the right half would not require saving because merge would not overwrite values that have not already been copied.

In my opinion, this method should not be taught at all.

Here is an alternative implementation without these shortcomings:

void merge(int A[], int p, int q, int r) {
    int *tmpL;
    int n1, n2;
    int i, j, k;

    // It is much simpler to consider q to point to the first element of
    // the second subarray and r to point after the last element of that array.
    q++;
    r++;

    n1 = q - p;  // number of elements in the left sorted subarray
    n2 = r - q;  // number of elements in the right sorted subarray

    tmpL = (int *)malloc(sizeof(int) * n1);
    if (tmpL == NULL) {
        // Report this fatal error or fall back to a different 
        // algorithm that does not require allocation, such as
        // heapsort or insertion sort.
        return;
    }
    // Make a copy of the left subarray as elements may be overwritten in the loop.
    for (i = 0; i < n1; i++) {
        tmpL[i] = A[p + i];
    }

    // Merge the elements of the subarrays:
    // - if all elements of the left array have been merged, 
    //   the remaining elements of the right subarray are already in place
    // - if k has reached r, all elements have been sorted
    for (i = j = 0, k = p; i < n1 && k < r; k++) {
        if (j >= n2 || tmpL[i] <= A[q + j]) {
            // Take the element from tmpL if the right subarray is empty
            //    or if it is no greater than the next one from the right subarray.
            A[k] = tmpL[i];
            i++;
        } else {
            // Otherwise take the element from the right subarray.
            A[k] = a[q + j];
            j++;
        }
    }
    free(tmpL);
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM