简体   繁体   中英

How to use fscanf() to get strings of multiple format?

Say this is the file I want to read :

07983988 REMOVE String1
13333337 ADD String4 100
34398111 TRANSFER String5 String6 100

Those are the only 3 valid types of formats.

I'm using the following block of code to check what is the format of the line parsed :

// Read from file.
    while (!feof(fd)) {

        // Check for format.
        if (fscanf(fd, "%d %s %s %s %lf", &timestamp, transaction_type_str, company1, company2, &value)) {
            node_t *transaction = create_node((long int)timestamp, 1, company1, company2, value);
            add_node(transactions, transaction);
        } else if (fscanf(fd, "%d %s %s", &timestamp, transaction_type_str, company1)) {
            node_t *transaction = create_node((long int)timestamp, 1, company1, NULL, 0);
            add_node(transactions, transaction);
        } else if (fscanf(fd, "%d %s %s %lf", &timestamp, transaction_type_str, company1, &value)) {
            node_t *transaction = create_node((long int)timestamp, 1, company1, NULL, value);
            add_node(transactions, transaction);
        }

This however is giving me an infinite loop. I'm new to file I/O in C, and I'm wondering if its better to use a tokenized approach or line based format search approach.

In outline:

char buffer[4096];

while (fgets(buffer, sizeof(buffer), fd) != 0)
{
    int offset;
    if (sscanf(buffer, "%d %s %n", &timestamp, transaction_type_str, &offset) == 2)
    {
         char *residue = buffer + offset;
         if (strcmp(transaction_type_str, "REMOVE") == 0)
         {
              if (sscanf(residue, "%s", company1) == 1)
                  …process remove…
              else
                  …report error, etc…
         }
         else if (strcmp(transaction_type_str, "ADD") == 0)
         {
             if (sscanf(residue, "%s %lf", company1, &value) == 2)
                 …process add…
             else
                 …report error, etc…
         }
         else if (strcmp(transaction_type_str, "TRANSFER") == 0)
         {
             if (sscanf(residue, "%s %s %lf", company1, company2, &value) == 3)
                 …process transfer…
             else
                 …report error, etc…
         }
         else
         {
             …report error and continue or break…
         }
     }
 }

You can make the analysis more stringent, for example, insisting that there's no unused data after the secondary sscanf() call is complete, etc. That's fiddly, but far from impossible.

This covers the requested code — it identifies which type of transaction is requested before analyzing the remainder of the line.

The problem is that you are using the feof test as the while control. Here are reasons why it is not working.

Why is “while ( !feof (file) )” always wrong?

FAQ > Why it's bad to use feof() to control a loop

The function tests the end-of-file indicator, not the stream itself. This means that another function is actually responsible for setting the indicator to denote EOF has been reached. This would normally be done by the function that performed the read that hit EOF. We can then follow the problem to that function, and we find that most read functions will set EOF once they've read all the data, and then performed a final read resulting in no data, only EOF.

With this in mind, how does it manifest itself into a bug in our snippet of code? Simple... as the program goes through the loop to get the last line of data, fgets() works normally, without setting EOF, and we print out the data. The loop returns to the top, and the call to feof() returns FALSE, and we start to go through the loop again. This time, the fgets() sees and sets EOF, but thanks to our poor logic, we go on to process the buffer anyway, without realising that its content is now undefined (most likely untouched from the last loop).

Using fgets() and then sscanf() with the " %n" specifier to detect scan completion.

By detecting if the scan reached the end and no additional text was on the line, we have a clear and symmetric parsing detection. Much like OP's original code.

#define BUFSIZE 200
char buf[BUFSIZE];

while (fgets(buf, sizeof buf, fd) != NULL) {
  int n = 0; 
  if (sscanf(buf, "%d %s %s %s %lf %n", 
      &timestamp, transaction_type_str, company1, company2, &value, &n));
  if (n > 0 && buf[n] == '\0') {
    node_t *transaction = create_node((long int)timestamp, 1, company1, company2, value);
    add_node(transactions, transaction);
    continue;
  }

  n = 0; 
  sscanf(buf, "%d %s %s %n", 
      &timestamp, transaction_type_str, company1, &n));
  if (n > 0 && buf[n] == '\0') {
    node_t *transaction = create_node((long int)timestamp, 1, company1, NULL, 0);
    add_node(transactions, transaction);
    continue;
  }

  n = 0; 
  sscanf(buf, "%d %s %s %lf %n", 
      &timestamp, transaction_type_str, company1, &value, &n);
  if (n > 0 && buf[n] == '\0') {
    node_t *transaction = create_node((long int)timestamp, 1, company1, NULL, value);
    add_node(transactions, transaction);
    continue;
  }

  Handle_BadLine(buf);  // do not use transaction_type_str, company1, company2, value, n
}

Note: using "%s" is dangerous. Better to limit the width

char transaction_type_str[20];
...
sscanf(buf, "%d %19s ...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM