For a class, I've been given the task of writing radix sort in parallel using pthreads, openmp, and MPI. My language of choice in this case is C -- I don't know C++ too well.
Anyways, the way I'm going about reading a text file is causing a segmentation fault at around 500MB file size. The files are line separated 32 bit numbers:
12351
1235234
12
53421
1234
I know C, but I don't know it well; I use things I know, and in this case the things I know are terribly inefficient. My code for reading the text file is as follows:
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <math.h>
int main(int argc, char **argv){
if(argc != 4) {
printf("rs_pthreads requires three arguments to run\n");
return -1;
}
char *fileName=argv[1];
uint32_t radixBits=atoi(argv[2]);
uint32_t numThreads=atoi(argv[3]);
if(radixBits > 32){
printf("radixBitx cannot be greater than 32\n");
return -1;
}
FILE *fileForReading = fopen( fileName, "r" );
if(fileForReading == NULL){
perror("Failed to open the file\n");
return -1;
}
char* charBuff = malloc(1024);
if(charBuff == NULL){
perror("Error with malloc for charBuff");
return -1;
}
uint32_t numNumbers = 0;
while(fgetc(fileForReading) != EOF){
numNumbers++;
fgets(charBuff, 1024, fileForReading);
}
uint32_t numbersToSort[numNumbers];
rewind(fileForReading);
int location;
for(location = 0; location < numNumbers; location++){
fgets(charBuff, 1024, fileForReading);
numbersToSort[location] = atoi(charBuff);
}
At a file of 50 million numbers (~500MB), I'm getting a segmentation fault at rewind of all places. My knowledge of how file streams work is almost non-existent. My guess is it's trying to malloc without enough memory or something, but I don't know.
So, I've got a two parter here: How is rewind segmentation faulting? Am I just doing a poor job before rewind and not checking some system call I should be?
And, what is a more efficient way to read in an arbitrary amount of numbers from a text file?
Any help is appreciated.
I think the most likely cause here is (ironically enough) a stack overflow . Your numbersToSort
array is allocated on the stack, and the stack has a fixed size (varies by compiler and operating system, but 1 MB is a typical number). You should dynamically allocate numbersToSort
on the heap (which has much more available space) using malloc()
:
uint32_t *numbersToSort = malloc(sizeof(uint32_t) * numNumbers);
Don't forget to deallocate it later:
free(numbersToSort);
I would also point out that your first-pass loop, which is intended to count the number of lines, will fail if there are any blank lines. This is because on a blank line, the first character is '\\n'
, and fgetc()
will consume it; the next call to fgets()
will then be reading the following line, and you'll have skipped the blank one in your count.
The problem is in this line
uint32_t numbersToSort[numNumbers];
You are attempting to allocate a huge array in stack, your stack size is in few KBytes (Moreover older C standards don't allow this). So you can try this
uint32_t *numbersToSort; /* Declare it with other declarations */
/* Remove uint32_t numbersToSort[numNumbers]; */
/* Add the code below */
numbersToSort = malloc(sizeof(uint32_t) * numNumbers);
if (!numbersToSort) {
/* No memory; do cleanup and bail out */
return 1;
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.