简体   繁体   中英

Read a file either with mmap or fscanf in C

I'm asking both an advice and an opinion

I have a file made of couples on integers, for example

1 2
1 3
4 7
2 5
3 10 

Now, I want to read it, but every method I can think of, has his own problems.

Using the mmap() function, it returns a string of character, but extracting the numbers from it seems very painful, since I don't know their lenghts, so the use atoi() or the classic Number-'0' seems not to be enough.

On the other hand, the use of fscanf gives me directly the numbers in their integer form, but I always have issues on its termination. How do you know when it has finished reading? Does it return '\\0' or EOF or something else? By experience, it seems to me that it behaves randomly.. It could be useful to use a function that counts the number of lines of a file, but does that even exist?

Now, to you. Which method would you prefer to use? And how would you fix the problems above?

You can use strtol to convert the numbers from your mmap() ped buffer and a line you've read via traditional I/O alike. I find it very convenient (in fact, even more convenient than fscanf under most circumstances). If you want to find the next newline in your buffer, carefully(!) using memchr is a very efficient way of doing so. It can then give you the next pointer to pass to strtol .

If you want generality, you should take precautions that not every file can be mmap() ped (eg pipes). Therefore, a robust program should try to mmap() the file and if that fails, fall back to traditional I/O.

mmap() with an ad-hoc parser is as fast as possible, even if it isn't very flexible. The below will parse files of the form you gave and perhaps no other , but if it were mechanically generated that might be ok:

char*p,*e,*x;
int m,n;
x=mmap(...); /* e=end of buffer; */
for(m=n=0,p=x;p<e;++p){
  if(*p==' '){m=n;n=0;}
  else if(*p=='\n'){emit(m,n);m=n=0;}
  else{n*=10;n+=*p-'0';}}

A better file format (binary) is faster still.


Regarding your second question: how do I know when fscanf() is at EOF? . That's what feof(fp) does. You want something like:

while(feof(fp)&&2==fscanf(fp,"%d %d\n",&m,&n))emit(m,n);

but beware: This is much slower than the above, and not much more robust. How much slower? On my mid-2012 MBA I'll get around 600mbps, while using fscanf I'll be lucky to get 10mbps.

Using fscanf is very easy. fscanf returns the number of items successfully scanned. So in your case you can use:

while(fscanf(fp,"%d %d",&int1,&int2)==2)
{ 
// successfully scanned 2 integers
}

Where fp is a file pointer and int1 and int2 are variables of type int .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM