简体   繁体   中英

Use fscanf to read strings and empty lines

I have a text file containing keywords and integers and have access to the file stream in order to parse this file.

I am able to parse it by doing while( fscanf(stream, "%s", word) != -1 ) which gets each word and int in the file for me to parse, but the problem I'm having is that I cannot detect an empty line "\\n" which then I need to detect for something. I can see that \\n is a character thus not detected by %s. What can I do to modify fscanf to also get EOL characters?

You can do exactly what it is you wish to do with fscanf , but the number of checks and validations required to do it properly, and completely is just painful compared to using a proper line oriented input function like fgets .

With fgets (or POSIX getline ) detecting an empty line requires nothing special, or in addition to, reading a normal line. For example, to read a line of text with fgets , you simply provide a buffer of sufficient size and make a single call to read up to and including the '\\n' into buf :

while (fgets (buf, BUFSZ, fp)) {        /* read each line in file */

To check whether the line was an empty-line, you simply check if the first character in buf is the '\\n' char, eg

    if (*buf == '\n')
        /* handle blank line */

or, in the normal course of things, you will be removing the trailing ' \\n' by obtaining the length and overwriting the '\\n' with the nul-terminating character. In which case, you can simply check if length is 0 (after removal), eg

    size_t len = strlen (buf);          /* get buf length */
    if (len && buf[len-1] == '\n')      /* check last char is '\n' */
        buf[--len] = 0;                 /* overwrite with nul-character */

( note: if the last character was not '\\n' , you know the line was longer than the buffer and characters in the line remain unread -- and will be read on the next call to fgets , or you have reached the end of the file with a non-POSIX line ending on the last line)

Putting it altogether, an example using fgets identifying empty lines, and providing for printing complete lines even if the line exceeds the buffer length, you could do something like the following:

#include <stdio.h>
#include <string.h>

#define BUFSZ 4096

int main (int argc, char **argv) {

    size_t n = 1;
    char buf[BUFSZ] = "";
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) {  /* validate file open for reading */
        fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
        return 1;
    }

    while (fgets (buf, BUFSZ, fp)) {        /* read each line in file */
        size_t len = strlen (buf);          /* get buf length */
        if (len && buf[len-1] == '\n')      /* check last char is '\n' */
            buf[--len] = 0;                 /* overwrite with nul-character */
        else {   /* line too long or non-POSIX file end, handle as required */
            printf ("line[%2zu] : %s\n", n, buf);
            continue;
        }   /* output line (or "empty" if line was empty) */
        printf ("line[%2zu] : %s\n", n++, len ? buf : "empty");
    }
    if (fp != stdin) fclose (fp);           /* close file if not stdin */

    return 0;
}

Example Input File

$ cat ../dat/captnjack2.txt
This is a tale

Of Captain Jack Sparrow

A Pirate So Brave

On the Seven Seas.

Example Use/Output

$ ./bin/fgetsblankln ../dat/captnjack2.txt
line[ 1] : This is a tale
line[ 2] : empty
line[ 3] : Of Captain Jack Sparrow
line[ 4] : empty
line[ 5] : A Pirate So Brave
line[ 6] : empty
line[ 7] : On the Seven Seas.

So Why Does Everybody Recommend fgets ?

Well, let's take a look at doing the same thing with fscanf and I'll let you be the judge. To begin with, fscanf does not read or include the trailing '\\n' with the "%s" format specifier (by default) or when using the character class "%[^\\n]" (because it was specifically excluded). So you do not have the ability to read a (1) line with characters and (2) line without characters using the same format string . You either read characters and fscanf succeeds, or you don't and you experience a matching failure .

So as alluded to in the comments, you have to pre-check if the next character in the input buffer is a '\\n' character using fgetc (or getc ) and then put it back in the input buffer with ungetc if it isn't.

Further adding to your fscanf task, you must independently validate each check, put back, and read every step along the way. This results in quite a number of checks to handle all cases and provide all checks necessary to avoid undefined behavior .

As part of those checks you will need to limit the number of characters you read to one less-than the number of characters in the buffer while capturing the next character to determine if the line was too long to fit. Additional checks are required to handle (without failure) a file with a non-POSIX line end on the final line -- something handled without issue by fgets .

Below is a similar implementation to the fgets code above. Go through and understand why each step it necessary and what each validation prevents against. You may be able to rearrange slightly, but it has been whittled down to close to the bare minimum. After going though it, it should become clear why fgets is the preferred method for handling checks for empty lines (as well as for line oriented input, generally)

#include <stdio.h>

#define BUFSZ 4096

int main (int argc, char **argv) {

    int c = 0, r = 0;
    size_t n = 1;
    char buf[BUFSZ] = "", nl = 0;
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) {  /* validate file open for reading */
        fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
        return 1;
    }

    for (;;) {  /* loop until EOF */
        if ((c = fgetc (fp)) == '\n')   /* check next char is '\n' */
            *buf = 0;                   /* make buf empty-string */
        else {
            if (c == EOF)               /* check if EOF */
                break;
            if (ungetc (c, fp) == EOF) {    /* ungetc/validate */
                fprintf (stderr, "error: ungetc failed.\n");
                break;
            }
            /* read line into buf and '\n' into nl, handle failure */
            if ((r = fscanf (fp, "%4095[^\n]%c", buf, &nl)) != 2) {
                if (r == EOF) {         /* EOF (input failure) */
                    break;
                } /* check next char, if not EOF, non-POSIX eol */
                else if ((c = fgetc (fp)) != EOF) {
                    if (ungetc (c, fp) == EOF) {    /* unget it */
                        fprintf (stderr, "error: ungetc failed.\n");
                        break;
                    } /* read line again handling non-POSIX eol */
                    if (fscanf (fp, "%4095[^\n]", buf) != 1) {
                        fprintf (stderr, "error: fscanf failed.\n");
                        break;
                    }
                }
            } /* good fscanf, validate nl = '\n' or line to long */
            else if (nl != '\n') {
                fprintf (stderr, "error: line %zu too long.\n", n);
                break;
            }
        } /* output line (or "empty" for empty line) */
        printf ("line[%2zu] : %s\n", n++, *buf ? buf : "empty");
    }

    if (fp != stdin) fclose (fp);     /* close file if not stdin */

    return 0;
}

The Use/Output is identical to above. Look things over and let me know if you have any further questions.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM