简体   繁体   中英

Parsing Large File in C

For a class, I've been given the task of writing radix sort in parallel using pthreads, openmp, and MPI. My language of choice in this case is C -- I don't know C++ too well.

Anyways, the way I'm going about reading a text file is causing a segmentation fault at around 500MB file size. The files are line separated 32 bit numbers:

12351
1235234
12
53421
1234

I know C, but I don't know it well; I use things I know, and in this case the things I know are terribly inefficient. My code for reading the text file is as follows:

#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <math.h>

int main(int argc, char **argv){

 if(argc != 4) {
   printf("rs_pthreads requires three arguments to run\n");
   return -1;
 }

 char *fileName=argv[1];
 uint32_t radixBits=atoi(argv[2]);
 uint32_t numThreads=atoi(argv[3]);

 if(radixBits > 32){
   printf("radixBitx cannot be greater than 32\n");
   return -1;
 }

 FILE *fileForReading = fopen( fileName, "r" );
 if(fileForReading == NULL){
   perror("Failed to open the file\n");
   return -1;
 }
 char* charBuff = malloc(1024);

 if(charBuff == NULL){
   perror("Error with malloc for charBuff");
   return -1;
 }

 uint32_t numNumbers = 0;
 while(fgetc(fileForReading) != EOF){
   numNumbers++;
   fgets(charBuff, 1024, fileForReading);
 }

 uint32_t numbersToSort[numNumbers];

 rewind(fileForReading);
 int location;
 for(location = 0; location < numNumbers; location++){
   fgets(charBuff, 1024, fileForReading);
   numbersToSort[location] = atoi(charBuff);
     } 

At a file of 50 million numbers (~500MB), I'm getting a segmentation fault at rewind of all places. My knowledge of how file streams work is almost non-existent. My guess is it's trying to malloc without enough memory or something, but I don't know.

So, I've got a two parter here: How is rewind segmentation faulting? Am I just doing a poor job before rewind and not checking some system call I should be?

And, what is a more efficient way to read in an arbitrary amount of numbers from a text file?

Any help is appreciated.

I think the most likely cause here is (ironically enough) a stack overflow . Your numbersToSort array is allocated on the stack, and the stack has a fixed size (varies by compiler and operating system, but 1 MB is a typical number). You should dynamically allocate numbersToSort on the heap (which has much more available space) using malloc() :

uint32_t *numbersToSort = malloc(sizeof(uint32_t) * numNumbers);

Don't forget to deallocate it later:

free(numbersToSort);

I would also point out that your first-pass loop, which is intended to count the number of lines, will fail if there are any blank lines. This is because on a blank line, the first character is '\\n' , and fgetc() will consume it; the next call to fgets() will then be reading the following line, and you'll have skipped the blank one in your count.

The problem is in this line

uint32_t numbersToSort[numNumbers];

You are attempting to allocate a huge array in stack, your stack size is in few KBytes (Moreover older C standards don't allow this). So you can try this

uint32_t *numbersToSort; /* Declare it with other declarations */


/* Remove uint32_t numbersToSort[numNumbers]; */
/* Add the code below */
numbersToSort = malloc(sizeof(uint32_t) * numNumbers);
if (!numbersToSort) {
     /* No memory; do cleanup and bail out */
     return 1;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM