C 合并排序算法中的分段错误

Question

I am trying to merge sort a on a key of a fairly large doubly-linked list in C, which has about 100,000 elements.我正在尝试在 C 中一个相当大的双向链表的键上合并排序 a，它有大约 100,000 个元素。 Here is the structure for the DLL elements:这是 DLL 元素的结构：

struct Pore {
    int ns;    /* voxel number */
    int radius;  /* effective radius of porosity surrounding a pore */
    struct Pore *next;
    struct Pore *prev;
};

After searching around for algorithms, the one I found most often used comprises three functions: mergeSort , merge , and split .在搜索了算法之后，我发现最常用的一个包括三个函数： mergeSort 、 merge和split 。 I am including them here... please excuse the multiple printf s in the merge function because I have been trying to debug a segmentation fault that happens upon the 4097592-nd recursive entry into the merge function.我将它们包括在这里...请原谅merge function 中的多个printf ，因为我一直在尝试调试在第 4097592 次递归进入merge ZC1C425268E683948D 时发生的分段错误Recur01 and Recur02 are global variables that I defined to help with the debugging. Recur01和Recur02是我为帮助调试而定义的全局变量。


void mergeSort(struct Pore **head)
{
    Recur01++;

    /* Base case: 0 or 1 pore */
    if ((*head) == NULL) {
        printf("\nEnter mergeSort %ld, list head is NULL ",Recur01);
        fflush(stdout);
        return;
    }
    if ((*head)->next == NULL) {
        printf("\nEnter mergeSort %ld, list head next is NULL ",Recur01);
        fflush(stdout);
        return;
    }

    printf("\nEnter mergeSort %ld",Recur01);
    fflush(stdout);
    /* Split head into 'a' and 'b' sublists */
    struct Pore *a = *head;
    struct Pore *b = NULL;
    split(*head, &a, &b);

    /* Recursively sort the sublists */
    mergeSort(&a);
    mergeSort(&b);

    /* Merge the two sorted halves */
    *head = merge(a,b);

    printf("\nExit mergeSort %ld",Recur01);
    fflush(stdout);
    return;
}

void split(struct Pore *head, struct Pore **a, struct Pore **b)
{
    int count = 0;
    int lngth = 1;
    struct Pore *slow = head;
    struct Pore *fast = head->next;
    struct Pore *temp;

    temp = head;
    while (temp->next != NULL) {
        lngth++;
        /*
        printf("\n    Length = %d",lngth);
        fflush(stdout);
        */
        if (temp->next) {
            temp = temp->next;
        }
    }

    while (fast != NULL) {
        printf("\nCount = %d",count);
        fflush(stdout);
        fast = fast->next;
        if (fast != NULL) {
            slow = slow->next;
            fast = fast->next;
        }
        count++;
    }

    printf("\nDone with while loop, final count = %d",count);
    fflush(stdout);

    *b = slow->next;
    slow->next = NULL;
    printf("\nExit split");
    fflush(stdout);
    return;
}

struct Pore *merge(struct Pore *a, struct Pore *b)
{
    Recur02++;

    if (Recur02 >= 4097591) {
        printf("\nEnter merge %ld",Recur02);
        fflush(stdout);
    }

    /** If first linked list is empty, return the second list */

    /* Base cases */
    if (a == NULL) return b;

    if (b == NULL) return a;

    if (Recur02 >= 4097591) {
        printf("\n    Made it 01");
        fflush(stdout);
    }

    /* Pick the larger key */

    if (a->radius > b->radius) {
        if (Recur02 >= 4097591) {
            printf("\n    Made it 02 a is bigger, Recur02 = %ld",Recur02);
            fflush(stdout);
            printf("      a->next->ns = %d",a->next->ns);
            fflush(stdout);
            printf("      b->ns = %d",b->ns);
            fflush(stdout);
        }
        a->next = merge(a->next,b);
        a->next->prev = a;
        a->prev = NULL;
        if (Recur02 >= 4097591) {
            printf("\nExit merge a %ld",Recur02);
            fflush(stdout);
        }
        return a;
    } else {
        if (Recur02 >= 4097591) {
            printf("\n    Made it 02 b is bigger, Recur02 = %ld",Recur02);
            fflush(stdout);
            printf("      b->next->ns = %d",b->next->ns);
            fflush(stdout);
            printf("      a->ns = %d",a->ns);
            fflush(stdout);
        }
        b->next = merge(a,b->next);
        b->next->prev = b;
        b->prev = NULL;
        if (Recur02 >= 4097591) {
            printf("\nExit merge b %ld",Recur02);
            fflush(stdout);
        }
        return b;
    }
}

Running the code works, like I said, until I get to the 4097592-nd entry into merge .就像我说的那样，运行代码是有效的，直到我到达 4097592-nd 进入merge 。 I put a printf right before the function call and another one immediately upon entry into the function.我在 function 调用之前放了一个printf ，在进入 function 后立即放另一个。 I also printf the keys of the elements in the function argument, and they seem okay too.我还printf参数中元素的键 function ，它们看起来也不错。 I'm not sure what else to try to get to the bottom of this.我不知道还有什么可以尝试弄清楚这一点。 Below is the last couple dozen lines of the output:下面是 output 的最后几十行：

Exit mergeSort 529095
Exit mergeSort 529095
Enter merge 4097591
    Made it 01
    Made it 02 a is bigger, Recur02 = 4097591      a->next->ns = 156692      b->ns = 20
Enter merge 4097591
Enter merge 4097592
    Made it 01
    Made it 02 a is bigger, Recur02 = 4097592      a->next->ns = 156693      b->ns = 20

That is the last line that gets flushed from the buffer before the segmentation fault.这是在分段错误之前从缓冲区中刷新的最后一行。 I have run out of ideas for how to debug this, so will be grateful for any advice.我已经没有关于如何调试它的想法，因此将不胜感激任何建议。

Answer 1

The segmentation fault is due to using a recursive merge that calls itself for every node merged.分段错误是由于使用递归合并，该合并为每个合并的节点调用自身。 It's OK for the main code to be top down, since that will have stack space complexity of O(log2(n)), but the merge function needs to be iterative.主代码自上而下是可以的，因为它的堆栈空间复杂度为 O(log2(n))，但合并 function 需要迭代。

most often used最常用的

The original implementation of std::list::sort() is a bottom up merge sort for linked lists that uses a small array (25 to 32) of lists (or pointers or iterators to the first nodes of a list). std::list::sort() 的原始实现是链表的自下而上合并排序，它使用列表的小数组（25 到 32）（或指向列表第一个节点的指针或迭代器）。

https://en.wikipedia.org/wiki/Merge_sort#Bottom-up_implementation_using_lists https://en.wikipedia.org/wiki/Merge_sort#Bottom-up_implementation_using_lists

Probably most implementations of std::list::sort were bottom up until Visual Studio 2015, which switched from using an array of lists to using iterators (to avoid issues like no default allocator and provide exception safety).可能 std::list::sort 的大多数实现都是自下而上的，直到 Visual Studio 2015 从使用列表数组切换到使用迭代器（以避免没有默认分配器等问题并提供异常安全）。 This came up in a prior thread, and initially I just accepted the change assuming the switch to iterators required the change to top down.这出现在之前的线程中，最初我只是接受了更改，假设切换到迭代器需要自上而下的更改。 The question came up again later, so I looked into this and determined there was no need to switch to top down merge sort.这个问题后来又出现了，所以我调查了一下，确定没有必要切换到自上而下的归并排序。 My main regret is not looking into this from the original question.我的主要遗憾是没有从最初的问题中调查这一点。 I did update my answer to show a stand-along iterator based bottom up merge sort, as well as a replament for std::list::sort in VS2019 include file.我确实更新了我的答案以显示一个基于独立迭代器的自下而上合并排序，以及对 VS2019 包含文件中的 std::list::sort 的替换。

`std::list<>::sort()` - why the sudden switch to top-down strategy? `std::list<>::sort()` - 为什么突然切换到自上而下的策略？

In most cases, provided there is enough memory, it's faster to copy the list to an array (or vector), sort the array, and create a new sorted list.在大多数情况下，只要有足够的 memory，将列表复制到数组（或向量）、对数组进行排序并创建新的排序列表会更快。 If the nodes in a large linked list are randomly scattered, that can translate into a cache miss for nearly every node accessed.如果大型链表中的节点是随机分散的，则几乎每个访问的节点都会导致缓存未命中。 By moving the list to array, the sequential access by merge sort on runs in an array are more cache friendly.通过将列表移动到数组，通过合并排序对数组中的运行进行顺序访问对缓存更加友好。 This is how Java's native sort for linked lists is implemented, although part of that is due to using a common collections.sort() for multiple container types, including a linked list, while the C++ standard library std::list is an independent container type with it's list specific member functions.这就是 Java 对链表的本机排序的实现方式，尽管部分原因是由于对包括链表在内的多种容器类型使用了通用的 collections.sort()，而 C++ 标准库 std::list 是一个独立的容器使用它的列表特定成员函数键入。

Answer 2

@VladfromMoscow suggested using a non-recursive sorting algorithm because recursive ones are not good for long lists. @VladfromMoscow 建议使用非递归排序算法，因为递归排序算法不适合长列表。 Therefore, I tried to adapt for my doubly linked list an iterative version of merge sort here .因此，我尝试在此处为我的双向链表调整合并排序的迭代版本。 Works like a charm.奇迹般有效。 At least in this case it seems that the recursion really was too deep for a list this long.至少在这种情况下，对于这么长的列表来说，递归似乎真的太深了。

C 合并排序算法中的分段错误

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-06-16 07:25:55

解决方案2
0 2020-06-16 02:39:45

C 合并排序算法中的分段错误

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-06-16 07:25:55

解决方案2 0 2020-06-16 02:39:45

解决方案1
1 已采纳 2020-06-16 07:25:55

解决方案2
0 2020-06-16 02:39:45