CRLS合并排序边界代码对C代码的理解

Question

void merge(int A[], int p, int q, int r) {
    int *tmpL, *tmpR;
    int boundary;
    int n1, n2;
    int i, j, k;

    n1 = q - p + 1;
    n2 = r - q;

    tmpL = (int *)malloc(sizeof(int) * (n1 + 1));
    tmpR = (int *)malloc(sizeof(int) * (n2 + 1));

    for (i = 0; i < n1; i++)
        tmpL[i] = A[p + i];
    for (j = 0; j < n2; j++)
        tmpR[j] = A[q + j + 1];

    boundary = tmpL[n1 - 1] > tmpR[n2 - 1] ? tmpL[n1 - 1] + 1 : tmpR[n2 - 1] + 1;
    tmpL[n1] = boundary;
    tmpR[n2] = boundary;

    i = 0;
    j = 0;

    for (k = p; k <= r; k++) {
        if (tmpL[i] <= tmpR[j]) {
            A[k] = tmpL[i];
            i++;
        } else {
            A[k] = tmpR[j];
            j++;
        }
    }

    free(tmpL);
    free(tmpR);
}
void merge_sort(int A[], int p, int r) {
    int q;

    if (p < r) {
        q = (p + r) / 2;
        merge_sort(A, p, q);
        merge_sort(A, q + 1, r);
        merge(A, p, q, r);
    }
}

I could not understand this infinite boundary code exactly boundary = tmpL[n1 - 1] > tmpR[n2 - 1] ? tmpL[n1 - 1] + 1 : tmpR[n2 - 1] + 1; 我无法理解这个无限边界代码，确切地是boundary = tmpL[n1 - 1] > tmpR[n2 - 1] ? tmpL[n1 - 1] + 1 : tmpR[n2 - 1] + 1; boundary = tmpL[n1 - 1] > tmpR[n2 - 1] ? tmpL[n1 - 1] + 1 : tmpR[n2 - 1] + 1;

Thanks https://i.stack.imgur.com/UmyUg.png (circled in blue) 谢谢https://i.stack.imgur.com/UmyUg.png （蓝色圆圈）

This is a conditional statement, A> B? C:D 这是一个条件语句， A> B? C:D A> B? C:D . A> B? C:D If A> B is true then evaluate C, else evaluate D. But I still do not understand the boundary part. 如果A> B为真，则评估C，否则评估D。但是我仍然不了解边界部分。 Is this the same as adding two while loops to deal with when one half of them have remaining elements and append them to the end of new arrays? 这与添加两个while循环来处理时（其中一半有剩余元素并将它们附加到新数组的末尾）一样吗？

If I don't initialize them as infinite boundary they may give me a segmentation fault. 如果我不将它们初始化为无限边界，它们可能会给我带来分割错误。

Answer 1

merge() is supposed to merge two already sorted runs in A, from A[p] to A[q], and from A[q+1] to A[r] (inclusive). merge（）应该合并A中两个已经排序的运行，从A [p]到A [q]，以及从A [q + 1]到A [r]（含）。 TmpL and TmpR are created, each with space for 1 extra element at the end to be used as a sentinel value that is greater than any value in either TmpL or TmpR. 创建了TmpL和TmpR，每个都在末尾留出1个额外元素的空间，以用作前哨值，该值大于TmpL或TmpR中的任何值。 The ternary statement sets boundary to the greater of the last values in TmpL and TmpR, then adds 1 to this value to create a sentinel value that is stored at the end of TmpL and TmpR. 三元语句将边界设置为TmpL和TmpR中最后一个值中的较大者，然后将该值加1以创建存储在TmpL和TmpR末尾的标记值。 This eliminates the need to check the indexes "i" or "j" to see if the end of TmpL or TmpR has been reached, in which case the rest of TmpR or TmpL would be copied back to A[]. 这样就无需检查索引“ i”或“ j”以查看是否已到达TmpL或TmpR的末尾，在这种情况下，其余的TmpR或TmpL将被复制回A []。

For most programming languages, instead of using the ternary statement, the code could have just set boundary to INT_MAX or one of the other max values from the include file limits.h (or for C++, climits): 对于大多数编程语言，该代码可以仅将边界设置为INT_MAX或包含文件limit.h中的其他最大值之一（或者对于C ++为climits），而不是使用三进制语句：

http://www.cplusplus.com/reference/climits http://www.cplusplus.com/reference/climits

If sorting floats or doubles, boundary can be set to infinity. 如果排序浮动或加倍，则边界可以设置为无穷大。

The reason for the segmentation fault is that without the sentinel value, the code may run beyond the end of either TmpL or TmpR causing the fault. 出现分段错误的原因是，如果没有前哨值，则代码可能会运行超出导致故障的TmpL或TmpR的结尾。

A problem with this method for sorting is that A[] may already contain the maximum possible value, in which case this approach will fail. 这种排序方法的一个问题是A []可能已经包含最大可能值，在这种情况下，这种方法将失败。 In the case of integers, adding 1 to the maximum value will wrap around to the smallest value. 对于整数，将1加到最大值将换成最小值。

Answer 2

The code uses a common approach for mergesort where copies are made of both subarrays with an extra element at the end, set to a value greater than the maximum value of both arrays. 该代码对mergesort使用一种通用方法，其中复制是由两个子数组组成的，末尾有一个额外的元素，其值设置为大于两个数组的最大值。

The statement boundary = tmpL[n1 - 1] > tmpR[n2 - 1] ? tmpL[n1 - 1] + 1 : tmpR[n2 - 1] + 1; 语句boundary = tmpL[n1 - 1] > tmpR[n2 - 1] ? tmpL[n1 - 1] + 1 : tmpR[n2 - 1] + 1; boundary = tmpL[n1 - 1] > tmpR[n2 - 1] ? tmpL[n1 - 1] + 1 : tmpR[n2 - 1] + 1; attempts to compute the value boundary as 1 plus the maximum value of tmpL or tmpR depending on which is greater. 尝试将值boundary计算为1加上tmpL或tmpR的最大值（取决于哪个更大）。 It uses a ternary expression which is roughly equivalent to writing: 它使用三元表达式，大致相当于编写：

    if (tmpL[n1 - 1] > tmpR[n2 - 1])
        boundary = tmpL[n1 - 1] + 1;
    else
        boundary = tmpR[n2 - 1] + 1;

The merging loop can then use a single test k <= r to stop the loop and i will equal n1 and j will equal n2 when k reaches r + 1 . 然后，合并循环可以使用单个测试k <= r停止循环，并且当k达到r + 1时， i将等于n1 ， j将等于n2 。

This approach is broken in many respects: 这种方法在很多方面都被打破了：

if either subarray contains the maximum value INT_MAX , the computation of boundary will overflow and cause undefined behavior. 如果任何一个子INT_MAX包含最大值INT_MAX ，则boundary的计算将溢出并导致未定义的行为。 Even if the overflow does not cause a fatal side-effect, the value of boundary will be meaningless, causing incorrect results or other undefined behavior. 即使溢出不会造成致命的副作用，价值boundary将变得毫无意义，从而导致不正确的结果或其他不确定的行为。
testing for array boundaries is simple, much simpler than this incomplete work-around. 测试数组边界很简单，比这种不完整的解决方法要简单得多。
this method requires both arrays to be allocated and copied, whereas the right half would not require saving because merge would not overwrite values that have not already been copied. 此方法需要分配和复制两个数组，而右半部分则不需要保存，因为merge不会覆盖尚未复制的值。

In my opinion, this method should not be taught at all. 我认为，完全不应教授这种方法。

Here is an alternative implementation without these shortcomings: 这是没有这些缺点的替代实现：

void merge(int A[], int p, int q, int r) {
    int *tmpL;
    int n1, n2;
    int i, j, k;

    // It is much simpler to consider q to point to the first element of
    // the second subarray and r to point after the last element of that array.
    q++;
    r++;

    n1 = q - p;  // number of elements in the left sorted subarray
    n2 = r - q;  // number of elements in the right sorted subarray

    tmpL = (int *)malloc(sizeof(int) * n1);
    if (tmpL == NULL) {
        // Report this fatal error or fall back to a different 
        // algorithm that does not require allocation, such as
        // heapsort or insertion sort.
        return;
    }
    // Make a copy of the left subarray as elements may be overwritten in the loop.
    for (i = 0; i < n1; i++) {
        tmpL[i] = A[p + i];
    }

    // Merge the elements of the subarrays:
    // - if all elements of the left array have been merged, 
    //   the remaining elements of the right subarray are already in place
    // - if k has reached r, all elements have been sorted
    for (i = j = 0, k = p; i < n1 && k < r; k++) {
        if (j >= n2 || tmpL[i] <= A[q + j]) {
            // Take the element from tmpL if the right subarray is empty
            //    or if it is no greater than the next one from the right subarray.
            A[k] = tmpL[i];
            i++;
        } else {
            // Otherwise take the element from the right subarray.
            A[k] = a[q + j];
            j++;
        }
    }
    free(tmpL);
}

CRLS合并排序边界代码对C代码的理解

问题描述

2 个解决方案

解决方案1
0 2019-04-03 23:25:33

解决方案2
0 已采纳 2019-04-06 09:13:43

CRLS合并排序边界代码对C代码的理解

问题描述

2 个解决方案

解决方案1 0 2019-04-03 23:25:33

解决方案2 0 已采纳 2019-04-06 09:13:43

解决方案1
0 2019-04-03 23:25:33

解决方案2
0 已采纳 2019-04-06 09:13:43