简体   繁体   中英

C- reading a multi-line file with fgets or sscanf?

lets say for example i have a text file

person1
25
500
male
person2
..
..
..
person3
..

with however many number of people, and i want to read in 4 lines of the file into a struct for each person in the file.

How can I do this? I have tried just using multiple fgets, but I don't know how I can loop until the end of the file while reading four lines at a time

Thanks

Some example lines. I'll leave you to supply the rest of the program.

#define MAX 1000

  ...

  FILE *f;
  char line1[MAX], line2[MAX], line3[MAX], line4[MAX];

  ...

  while(fgets(line1, MAX, f) != NULL)
    {
      if (fgets(line2, MAX, f) == NULL ||
          fgets(line3, MAX, f) == NULL ||
          fgets(line4, MAX, f) == NULL)
        {
         /* insert code here to handle end of file in unexpected place */
          break;
        }
      /* insert code here to do your sscanf and anything else you want */
    }

    ....

Continuing from the comment above, when you have a fixed repeating set of lines in a data file that you need to read into a struct , this is one of the only exception where you should consider scanf()/fscanf() over the recommended fgets()/sscanf() of each line.

Why?

scanf() is a formatted input function (compared to fgets() which is a line-oriented input function ). If you have formatted input that spans multiple lines, scanf()/fscanf() ignore whitespace (the '\n' character being whitespace ) and will allow you to consume multiple lines as a single input (with a properly crafted format-string )

When using scanf()/fscanf() to read data into a string (or array), you must use the field-width modifier to limit the number of values read into your array to avoid writing beyond the end of your array invoking Undefined Behavior if an input exceeds your array bounds. This applies whenever you use scanf()/fscanf()/sscanf() (the entire family). Use without a field-width modifier to read array data is no better than using gets() .

So how to craft your format-string ? Let's look at an example struct with 4-members similar to what you show in your question, eg

...
#define MAXG 8      /* if you need a constant, #define one (or more) */
#define MAXP 32
#define MAXN 128

typedef struct {    /* struct with typedef */
    char name[MAXN], gender[MAXG];
    int iq, weight;
} person;
...

With your data as shown, and with the declaration for name being 128 characters and that for gender being 8 characters and the remaining two members being int types, you can do something similar to the following:

    int rtn;                                /* fscanf return */
    size_t n = 0;                           /* number of struct filled */
    person ppl[MAXP] = {{ .name = "" }};    /* array of person */
    ...
    while (n < MAXP &&  /* protect struct array bound, and each array bound below */
            (rtn = fscanf (fp, " %127[^\n]%d%d %7[^\n]", /* validate each read */
                    ppl[n].name, &ppl[n].iq, &ppl[n].weight, ppl[n].gender)) == 4)
        n++;            /* increment array index */

Looking specifically at the format string , you have:

    " %127[^\n]%d%d %7[^\n]"

where " %127[^\n]" , by virtue of the leading ' ' , consumes any leading whitespace, then reads at most 127 characters (you cannot use a variable or macro to specify field-width ), the characters are any character in the line that is NOT the '\n' character (allowing you to read whitespace as part of the name, eg "Mickey Mouse" ).

Note the "%[...] is a string conversion and will read any character in the list of characters [...] as a string. Using the circumflex '^' as the first character of the list negates the match resulting in "%[^\n]" read all characters not including the '\n' into a string.

The space before " %[^\n]" is required because "%[...]" like "%c" are the only conversion-specifiers that do not consume leading whitespace , so you provide for that by including a space before the conversion in your format string. The other two conversion specifiers for int , eg "%d" will consume leading whitespace on their own resulting in the total conversion:

    " %127[^\n]%d%d %7[^\n]"

That, in summary, will:

  • consume any leading whitespace (the '\n' left in stdin from the prior read or of gender for the previous struct in the array);
  • read a line up to 127-characters into the name member with %127[^\n] ;
  • readi the line containing the first integer value into iq with %d (which consumes leading whitespace);
  • read the line containing the second integer value into weight with %d (ditto);
  • ' ' consume the '\n' left from the read of weight ; and finally
  • read a line up to 7-characters into the gender member with %7[^\n] (adjust as necessary to hold your longest gender string)

With that approach you can consume 4-lines of input into each struct in an array with a single call to fscanf() . You should check rtn on loop exit to ensure the loop exited on EOF after reading all values from the file. A simple check will cover the minimum validation needed, eg

    if (rtn != EOF) /* if loop exited on other than EOF, issue warning */
        fputs ("warning: error in file format or array full.\n", stderr);

( note: you can also check if n == MAXP to see if the reason for loop exit was due to the array being full separately).

Putting it altogether, you could do:

#include <stdio.h>

#define MAXG 8      /* if you need a constant, #define one (or more) */
#define MAXP 32
#define MAXN 128

typedef struct {    /* struct with typedef */
    char name[MAXN], gender[MAXG];
    int iq, weight;
} person;

int main (int argc, char **argv) {

    int rtn;                                /* fscanf return */
    size_t n = 0;                           /* number of struct filled */
    person ppl[MAXP] = {{ .name = "" }};    /* array of person */
    /* use filename provided as 1st argument (stdin by default) */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) {  /* validate file open for reading */
        perror ("file open failed");
        return 1;
    }

    while (n < MAXP &&  /* protect struct array bound, and each array bound below */
            (rtn = fscanf (fp, " %127[^\n]%d%d %7[^\n]", /* validate each read */
                    ppl[n].name, &ppl[n].iq, &ppl[n].weight, ppl[n].gender)) == 4)
        n++;            /* increment array index */

    if (rtn != EOF) /* if loop exited on other than EOF, issue warning */
        fputs ("warning: error in file format or array full.\n", stderr);

    for (size_t i = 0; i < n; i++)  /* output results */
        printf ("\nname   : %s\niq     : %d\nweight : %d\ngender : %s\n",
                ppl[i].name, ppl[i].iq, ppl[i].weight, ppl[i].gender);

    if (fp != stdin)   /* close file if not stdin */
        fclose (fp);
}

( note: you can also use a global enum to define your constants as well)

Example Input File

$ cat dat/ppl.txt
person1
25
500
male
person2
128
128
female
Mickey Mouse
56
2
male
Minnie Mouse
96
1
female

Example Use/Output

$ ./bin/readppl dat/ppl.txt

name   : person1
iq     : 25
weight : 500
gender : male

name   : person2
iq     : 128
weight : 128
gender : female

name   : Mickey Mouse
iq     : 56
weight : 2
gender : male

name   : Minnie Mouse
iq     : 96
weight : 1
gender : female

You can also read each line with fgets() using either a line-counter or multi-line read approach, but this is more about choosing the proper tool for the job. There is nothing wrong with using fgets() and then multiple calls to sscanf() for the integer values or two calls to strtol() for the conversion, but with large input files, 1-function call to fscanf() compared to 4-separate calls to fgets() plus 2-separate calls to sscanf() or strtol() plus the additional logic for handling your line-counter or multi-buffer logic will start to add up.

Look things over and let me know if you have further questions.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM