简体   繁体   中英

C: fscanf - infinite loop when first character matches

I am attempting to parse a text (CSS) file using fscanf and pull out all statements that match this pattern:

@import "some/file/somewhere.css";

To do this, I have the following loop set up:

FILE *file = fopen(pathToSomeFile, "r");
char *buffer = (char *)malloc(sizeof(char) * 9000);

while(!feof(file))
{
    // %*[^@] : Read and discard all characters up to a '@'
    // %8999[^;] : Read up to 8999 characters starting at '@' to a ';'.
    if(fscanf(file, "%*[^@] %8999[^;]", buffer) == 1)
    {
        // Do stuff with the matching characters here.
        // This code is long and not relevant to the question.
    }
}

This works perfectly SO LONG AS the VERY FIRST character in the file is not a '@'. (Literally, a single space before the first '@' character in the CSS file will make the code run fine.)

But if the very first character in the CSS file is a '@', then what I see in the debugger is an infinite loop -- execution enters the while loop, hits the fscanf statement, but does not enter the 'if' statement (fscanf fails), and then continues through the loop forever.

I believe my fscanf formatters may need some tweaking, but am unsure how to proceed. Any suggestions or explanations for why this is happening?

Thank you.

I'm not an expert on scanf pattern syntax, but my interpretation of yours is:

  • Match a non-empty sequence of non- '@' characters, then
  • Match a non-empty sequence of up to 8999 non- ';' characters

So yes, if your string starts with a '@' , then the first part will fail.

I think if you start your format string with some whitespace, then fscanf will eat any leading whitespace in your data string, ie simply " %8999[^;]" .

Oli already said why fscanf failed. And since failure is a normal state for fscanf your busy loop is not the consequence of the fscanf failure but of the missing handling for it.

You have to handle a fscanf failure even if your format would be correct (in your special case), because you cannot be sure that the input always is matchable by the format. Actually you can be sure that much more nonmatching input exists than matching input.

Your format string does the following actions:

  • Read (and discard) 1 or more non- @ characters
  • Read (and discard) 0 or more whitespace characters (due to the space in the format string)
  • Read and store 1 to 8999 non- ; characters

Unfortunately, there is no format specifier for reading "zero or more" characters from a user-defined set.

If you don't care about multiple @include statements on a line, you could change your code to read a single line (with fgets), and then extract the @include statement from that (if the first character does not equal @ , you can use your current format string with sscanf, otherwise, you could use sscanf(line, "%8999[^;]", buffer) ).

If multiple @include statemens on a line should be handled correctly, you could inspect the next character to be read with getc and then put it back with ungetc .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM