简体   繁体   English

仅在大型数组上出现合并排序分段错误

[英]Merge Sort Segmentation Fault Only On Large Arrays

I'm working on implementing a few different sorting methods and for some reason my merge sort algorithm will not work on large data sets.我正在实施几种不同的排序方法,由于某种原因,我的合并排序算法不适用于大型数据集。 The sort will work for 115,000 words but stops working when it reaches 135,000 words.排序将适用于 115,000 个单词,但在达到 135,000 个单词时停止工作。 Once I get this high I end up getting a segmentation fault.一旦我达到这个高度,我最终会遇到分段错误。 I do not understand where the seg fault is coming from.我不明白段错误来自哪里。 The sort works successfully for text files containing 5K to 125K strings.排序适用于包含 5K 到 125K 字符串的文本文件。

The readFile array gets initialized with the number of words in the text file. readFile数组使用文本文件中的字数进行初始化。 When debugging it seems like the last numbers that get passed into the mergeSort() function are the following:调试时,传递给mergeSort()函数的最后一个数字似乎如下:

#0  0x0000000000402a87 in merge (inputString=0x7fffffbde790, from=0, mid=67499, to=134999) at mergeSort.cpp:102
    n1 = 67500
    n2 = 67500
    i = 0
    j = 0
    k = 32767
    L = <error reading variable L (value requires 2160000 bytes, which is more than max-value-size)>
    R = <error reading variable R (value requires 2160000 bytes, which is more than max-value-size)>
#1  0x0000000000402921 in mergeSort (inputString=0x7fffffbde790, from=0, to=134999) at mergeSort.cpp:88
    mid = 67499
void mergeSort(string readFile[], int from, int to) {
    if (from < to) {
        int mid = from + (to - from) / 2;
        mergeSort(readFile, from, mid);
        mergeSort(readFile, mid + 1, to);
        merge(readFile, from, mid, to);
    }
}
void merge(string readFile[], int from, int mid, int to) {
    int n1 = mid - from + 1;
    int n2 = to - mid;

    string L[n1];
    string R[n2];

    for (int i = 0; i < n1; i++) {
        L[i] = readFile[from + i];
    }
    for (int i = 0; i < n2; i++) {
        R[i] = readFile[mid + i + 1];
    }

    int i = 0;
    int j = 0;
    int k = from;

    while (i < n1 && j < n2) {
        if (L[i] <= R[j]) {
            readFile[k] = L[i];
            i++;
        } else {
            readFile[k] = R[j];
            j++;
        }
        k++;
    }
    while (i < n1) {
        readFile[k] = L[i];
        i++;
        k++;
    }
    while (j < n2) {
        readFile[k] = R[j];
        j++;
        k++;
    }
}

You allocate temporary arrays as automatic variables in the merge function.您将临时数组分配为merge函数中的自动变量。 When the size of these arrays become too large, you lack stack space to allocate them and get undefined behavior (eg a stack overflow ).当这些数组的大小变得太大时,您将缺乏分配它们的堆栈空间并出现未定义的行为(例如堆栈溢出)。

To handle arbitrarily large arrays, you should allocate the temporary arrays with malloc or new and free them accordingly.要处理任意大的数组,您应该使用mallocnew分配临时数组并相应地释放它们。 To limit the number of allocations, you could allocate a temporary array in a wrapper and pass that recursively in the mergeSort function.要限制分配的数量,您可以在包装器中分配一个临时数组,并在mergeSort函数中递归地传递它。

Here is a simple fix allocating temporary arrays in the merge function:这是在merge函数中分配临时数组的简单修复:

void merge(string readFile[], int from, int mid, int to) {
    int n1 = mid - from + 1;
    int n2 = to - mid;

    string *L = new string[n1];
    string *R = new string[n2];

    for (int i = 0; i < n1; i++) {
        L[i] = readFile[from + i];
    }
    for (int i = 0; i < n2; i++) {
        R[i] = readFile[mid + i + 1];
    }

    int i = 0;
    int j = 0;
    int k = from;

    while (i < n1 && j < n2) {
        if (L[i] <= R[j]) {
            readFile[k] = L[i];
            i++;
        } else {
            readFile[k] = R[j];
            j++;
        }
        k++;
    }
    while (i < n1) {
        readFile[k] = L[i];
        i++;
        k++;
    }
    while (j < n2) {
        readFile[k] = R[j];
        j++;
        k++;
    }
    delete[] L;
    delete[] R;
}

Here is a more elaborate version, possibly more efficient, allocating a single temporary array:这是一个更复杂的版本,可能更有效,分配一个临时数组:

void merge(string readFile[], size_t from, size_t mid, size_t to, string aux[]) {
    size_t i, j, k;

    for (i = from; i < to; i++) {
        aux[i] = readFile[i];
    }

    i = from;
    j = mid;
    k = from;

    while (i < mid && j < to) {
        if (aux[i] <= aux[j]) {
            readFile[k++] = aux[i++];
        } else {
            readFile[k++] = aux[j++];
        }
    }
    while (i < mid) {
        readFile[k++] = aux[i++];
    }
    while (j < to) {
        readFile[k++] = aux[j++];
    }
}

void mergeSort(string readFile[], size_t from, size_t to, string aux[]) {
    if (to - from > 1) {
        size_t mid = from + (to - from) / 2;
        mergeSort(readFile, from, mid, aux);
        mergeSort(readFile, mid, to, aux);
        merge(readFile, from, mid, to, aux);
    }
}

void mergeSort(string readFile[], size_t n) {
    string *aux = new string[n];
    mergeSort(readFile, 0, n, aux);
    delete[] aux;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM