简体   繁体   中英

C99: Is it standard that fscanf() sets eof earlier than fgetc()?

I tried with VS2017 (32 Bit Version) on a 64 bit Windows PC and it seems to me that fscanf() sets the eof flag immediately after successfully reading the last item within a file. This loop terminates immeadiately after fscanf() has read the last item in the file related to stream:

while(!feof(stream))
{
    fscanf(stream,"%s",buffer);
    printf("%s",buffer);
}

I know this is insecure code... I just want to understand the behaviour. Please forgive me ;-)

Here, stream is related to an ordinary text file containing strings like "Hello World!". The last character in that file is not a newline character.

However, fgetc(), having processed the last character, tries to read yet another one in this loop, which leads to c=0xff (EOF):

while (!feof(stream))
{
    c = fgetc(stream);
    printf("%c", c);
}

Is this behaviour of fscanf() and fgetc() standardized, implementation dependent or something else? I am not asking why the loop terminates or why it does not terminate. I am interested in the question if this is standard behaviour .

In my experience, when working with <stdio.h> the precise semantics of the "eof" and "error" bits are very, very subtle, so much so that it's not usually worth it (it may not even be possible) to try to understand exactly how they work. (The first question I ever asked on SO was about this, although it involved C++, not C.)

I think you know this, but the first thing to understand is that the intent of feof() is very much not to predict whether the next attempt at input will reach the end of the file. The intent is not even to say that the input stream is "at" the end of the file. The right way to think about feof() (and the related ferror() ) is that they're for error recovery , to tell you a bit more about why a previous input call failed.

And that's why writing a loop involving while(!feof(fp)) is always wrong .

But you're asking about precisely when fscanf hits end-of-file and sets the eof bit, versus getc / fgetc . With getc and fgetc , it's easy: they try to read one character, and they either get one or they don't (and if they don't, it's either because they hit end-of-file or encountered an i/o error).

But with fscanf it's trickier, because depending on the input specifier being parsed, characters are accepted only as long as they're appropriate for the input specifier. The %s specifier, for example, stops not only if it hits end-of-file or gets an error, but also when it hits a whitespace character. (And that's why people were asking in the comments whether your input file ended with a newline or not.)

I've experimented with the program

#include <stdio.h>

int main()
{
    char buffer[100];
    FILE *stream = stdin;

    while(!feof(stream)) {
        fscanf(stream,"%s",buffer);
        printf("%s\n",buffer);
    }
}

which is pretty close to what you posted. (I added a \\n in the printf so that the output was easier to see, and better matched the input.) I then ran the program on the input

This
is
a
test.

and, specifically, where all four of those lines ended in a newline. And the output was, not surprisingly,

This
is
a
test.
test.

The last line is repeated because that's what (usually) happens when you write while(!feof(stream)) .

But then I tried it on the input

This\n
is\n
a\n
test.

where the last line did not have a newline. This time, the output was

This
is
a
test.

This time, the last line was no t repeated. (The output was still not identical to the input, because the output contained four newlines while the input contained three.)

I think the difference between these two cases is that in the first case, when the input contains a newline, fscanf reads the last line, reads the last \\n , notices that it's whitespace, and returns, but it has not hit EOF and so does not set the EOF bit. In the second case, without the trailing newline, fscanf hits end-of-file while reading the last line, and so does set the eof bit, so feof() in the while() condition is satisfied, and the code does not make an extra trip through the loop, and the last line is not repeated.

We can see a bit more clearly what's going on if we look at fscanf 's return value. I modified the loop like this:

while(!feof(stream)) {
    int r = fscanf(stream,"%s",buffer);
    printf("fscanf returned %2d: %5s (eof: %d)\n", r, buffer, feof(stream));
}

Now, when I run it on a file that ends with a newline, the output is:

fscanf returned  1:  This (eof: 0)
fscanf returned  1:    is (eof: 0)
fscanf returned  1:     a (eof: 0)
fscanf returned  1: test. (eof: 0)
fscanf returned -1: test. (eof: 1)

We can clearly see that after the fourth call, feof(stream) is not true yet, meaning that we'll make that last, extra, unnecessary, fifth trip through the loop. But we can see that during the fifth trip, fscanf returns -1, indicating (a) that it did not read a string as expected and (b) it reached EOF.

If I run it on input not containing the trailing newline, on the other hand, the output is like this:

fscanf returned  1:  This (eof: 0)
fscanf returned  1:    is (eof: 0)
fscanf returned  1:     a (eof: 0)
fscanf returned  1: test. (eof: 1)

Now, feof is true immediately after the fourth call to fscanf , and the extra trip is not made.

Bottom line: the moral is (the morals are):

  1. Don't write while(!feof(stream)) .
  2. Do use feof() and ferror() only to test why a previous input call failed.
  3. Do check the return value of scanf and fscanf .

And we might also note: Do beware of files not ending in newline! They can behave surprisingly differently.


Addendum: Here's a better way to write the loop:

while((r = fscanf(stream,"%s",buffer)) == 1) {
    printf("%s\n", buffer);
}

When you run this, it always prints exactly the strings it sees in the input. It doesn't repeat anything; it doesn't do anything significantly differently depending on whether the last line does or doesn't end in a newline. And -- significantly -- it doesn't (need to) call feof() at all!


Footnote: In all of this I've ignored the fact that %s with *scanf reads strings , not lines. Also that %s tends to behave very badly if it encounters a string that's larger than the buffer that's to receive it.

Both of your loops are incorrect: feof(f) is only set after an unsuccessful attempt to read past the end of file. In your code, you do not test for fgetc() returning EOF nor if fscanf() returns 0 or EOF .

Indeed fscanf() can set the end of file condition of a stream if it reaches the end of file, which it does for %s if the file does not contain a trailing newline, whereas fgets() would not set this condition if the file ends with a newline. fgetc() sets the condition only when it returns EOF .

Here is a modified version of your code that illustrates this behavior:

#include <stdio.h>

int main() {
    FILE *fp = stdin;
    char buf[100];
    char *p;
    int c, n, eof;

    for (;;) {
       c = fgetc(fp);
       eof = feof(fp);
       if (c == EOF) {
           printf("c=EOF, feof()=%d\n", eof);
           break;
       } else {
           printf("c=%d, feof()=%d\n", c, eof);
       }
    }

    rewind(fp); /* clears end-of-file and error indicators */
    for (;;) {
        n = fscanf(fp, "%99s", buf);
        eof = feof(fp);
        if (n == 1) {
            printf("fscanf() returned 1, buf=\"%s\", feof()=%d\n", buf, eof);
        } else {
            printf("fscanf() returned %d, feof()=%d\n", n, eof);
            break;
        }
    }

    rewind(fp); /* clears end-of-file and error indicators */
    for (;;) {
        p = fgets(buf, sizeof buf, fp);
        eof = feof(fp);
        if (p == buf) {
            printf("fgets() returned buf, buf=\"%s\", feof()=%d\n", buf, eof);
        } else
        if (p == NULL) {
            printf("fscanf() returned NULL, feof()=%d\n", eof);
            break;
        } else {
            printf("fscanf() returned %p, buf=%p, feof()=%d\n", (void*)p, (void*)buf, eof);
            break;
        }
    }
    return 0;
}

When run with standard input redirected from a file containing Hello world without a trailing newline, here is the output:

c=72, feof()=0
c=101, feof()=0
c=108, feof()=0
c=108, feof()=0
c=111, feof()=0
c=32, feof()=0
c=119, feof()=0
c=111, feof()=0
c=114, feof()=0
c=108, feof()=0
c=100, feof()=0
c=EOF, feof()=1
fscanf() returned 1, buf="Hello", feof()=0
fscanf() returned 1, buf="world", feof()=1
fscanf() returned -1, feof()=1
fgets() returned buf, buf="Hello world", feof()=1
fscanf() returned NULL, feof()=1

The C Standard specifies the behavior of the stream functions in terms of individual calls to fgetc , fgetc sets the end of file condition when it cannot read a byte from the stream at end of file.

The behavior illustrated above conforms to the Standard and shows how testing feof() is not a good approach to validate input operations. feof() can return non-zero after successful operations and can return 0 before unsuccessful operations. feof() is should only be used to distinguish end of file from input error after an unsuccessful input operation. Very few programs make this distinction, hence feof() is almost never used on purpose and almost always indicates a programming error. For extra explanations, read this: Why is “while ( !feof (file) )” always wrong?

If I might offer a tl;dr to both the comprehensive answers here, formatted input reads characters until it has reason to stop. Since you say

The last character in that file is not a newline character

and the %s directive reads a string of non-whitespace characters, after it reads the ! in World! it has to read another character. There isn't one, which lights eof.

Put whitespace (space, newline, whatever) at the end of the phrase, and your printf will print the last word twice: once because it read it, and again because the scanf failed to find a string to read before hitting eof, so the %s conversion never happened leaving the buffer untouched.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM