简体   繁体   English

如何使用fscanf()获取多种格式的字符串?

[英]How to use fscanf() to get strings of multiple format?

Say this is the file I want to read : 说这是我要阅读的文件:

07983988 REMOVE String1
13333337 ADD String4 100
34398111 TRANSFER String5 String6 100

Those are the only 3 valid types of formats. 这些是仅有的3种有效格式。

I'm using the following block of code to check what is the format of the line parsed : 我正在使用以下代码块来检查解析的行的格式是什么:

// Read from file.
    while (!feof(fd)) {

        // Check for format.
        if (fscanf(fd, "%d %s %s %s %lf", &timestamp, transaction_type_str, company1, company2, &value)) {
            node_t *transaction = create_node((long int)timestamp, 1, company1, company2, value);
            add_node(transactions, transaction);
        } else if (fscanf(fd, "%d %s %s", &timestamp, transaction_type_str, company1)) {
            node_t *transaction = create_node((long int)timestamp, 1, company1, NULL, 0);
            add_node(transactions, transaction);
        } else if (fscanf(fd, "%d %s %s %lf", &timestamp, transaction_type_str, company1, &value)) {
            node_t *transaction = create_node((long int)timestamp, 1, company1, NULL, value);
            add_node(transactions, transaction);
        }

This however is giving me an infinite loop. 但是,这给了我无限循环。 I'm new to file I/O in C, and I'm wondering if its better to use a tokenized approach or line based format search approach. 我是C语言中的文件I / O的新手,我想知道使用标记化方法还是基于行的格式搜索方法是否更好。

In outline: 在大纲中:

char buffer[4096];

while (fgets(buffer, sizeof(buffer), fd) != 0)
{
    int offset;
    if (sscanf(buffer, "%d %s %n", &timestamp, transaction_type_str, &offset) == 2)
    {
         char *residue = buffer + offset;
         if (strcmp(transaction_type_str, "REMOVE") == 0)
         {
              if (sscanf(residue, "%s", company1) == 1)
                  …process remove…
              else
                  …report error, etc…
         }
         else if (strcmp(transaction_type_str, "ADD") == 0)
         {
             if (sscanf(residue, "%s %lf", company1, &value) == 2)
                 …process add…
             else
                 …report error, etc…
         }
         else if (strcmp(transaction_type_str, "TRANSFER") == 0)
         {
             if (sscanf(residue, "%s %s %lf", company1, company2, &value) == 3)
                 …process transfer…
             else
                 …report error, etc…
         }
         else
         {
             …report error and continue or break…
         }
     }
 }

You can make the analysis more stringent, for example, insisting that there's no unused data after the secondary sscanf() call is complete, etc. That's fiddly, but far from impossible. 您可以使分析更加严格,例如,坚持要求在辅助sscanf()调用完成之后没有未使用的数据,等等。这很奇怪,但绝非不可能。

This covers the requested code — it identifies which type of transaction is requested before analyzing the remainder of the line. 它涵盖了请求的代码-在分析该行的其余部分之前,它确定请求的交易类型。

The problem is that you are using the feof test as the while control. 问题是您将feof测试用作while控件。 Here are reasons why it is not working. 这是它不起作用的原因。

Why is “while ( !feof (file) )” always wrong? 为什么“ while(!feof(file))”总是错误的?

FAQ > Why it's bad to use feof() to control a loop FAQ>为什么用feof()控制循环不好

The function tests the end-of-file indicator, not the stream itself. 该功能测试文件结束指示符,而不是流本身。 This means that another function is actually responsible for setting the indicator to denote EOF has been reached. 这意味着另一个功能实际上负责设置指示已达到EOF的指示器。 This would normally be done by the function that performed the read that hit EOF. 通常可以通过执行读取EOF的功能来完成此操作。 We can then follow the problem to that function, and we find that most read functions will set EOF once they've read all the data, and then performed a final read resulting in no data, only EOF. 然后,我们可以将问题跟踪到该函数,并且发现大多数读取函数在读取完所有数据后便会设置EOF,然后执行最终读取,结果是没有数据,只有EOF。

With this in mind, how does it manifest itself into a bug in our snippet of code? 考虑到这一点,它如何在我们的代码片段中表现为一个错误? Simple... as the program goes through the loop to get the last line of data, fgets() works normally, without setting EOF, and we print out the data. 很简单...程序在循环中获取最后一行数据时,fgets()可以正常工作,而无需设置EOF,我们可以打印出数据。 The loop returns to the top, and the call to feof() returns FALSE, and we start to go through the loop again. 循环返回到顶部,对feof()的调用返回FALSE,我们再次开始循环。 This time, the fgets() sees and sets EOF, but thanks to our poor logic, we go on to process the buffer anyway, without realising that its content is now undefined (most likely untouched from the last loop). 这次,fgets()看到并设置了EOF,但是由于我们的逻辑欠佳,我们仍然继续处理该缓冲区,而没有意识到其内容现在是未定义的(很可能在上一个循环中未触及)。

Using fgets() and then sscanf() with the " %n" specifier to detect scan completion. 使用fgets() ,然后使用带有" %n"说明符的sscanf()来检测扫描完成。

By detecting if the scan reached the end and no additional text was on the line, we have a clear and symmetric parsing detection. 通过检测扫描是否到达末尾并且在线上没有其他文本,我们可以进行清晰而对称的解析检测。 Much like OP's original code. 很像OP的原始代码。

#define BUFSIZE 200
char buf[BUFSIZE];

while (fgets(buf, sizeof buf, fd) != NULL) {
  int n = 0; 
  if (sscanf(buf, "%d %s %s %s %lf %n", 
      &timestamp, transaction_type_str, company1, company2, &value, &n));
  if (n > 0 && buf[n] == '\0') {
    node_t *transaction = create_node((long int)timestamp, 1, company1, company2, value);
    add_node(transactions, transaction);
    continue;
  }

  n = 0; 
  sscanf(buf, "%d %s %s %n", 
      &timestamp, transaction_type_str, company1, &n));
  if (n > 0 && buf[n] == '\0') {
    node_t *transaction = create_node((long int)timestamp, 1, company1, NULL, 0);
    add_node(transactions, transaction);
    continue;
  }

  n = 0; 
  sscanf(buf, "%d %s %s %lf %n", 
      &timestamp, transaction_type_str, company1, &value, &n);
  if (n > 0 && buf[n] == '\0') {
    node_t *transaction = create_node((long int)timestamp, 1, company1, NULL, value);
    add_node(transactions, transaction);
    continue;
  }

  Handle_BadLine(buf);  // do not use transaction_type_str, company1, company2, value, n
}

Note: using "%s" is dangerous. 注意:使用"%s"是危险的。 Better to limit the width 最好限制宽度

char transaction_type_str[20];
...
sscanf(buf, "%d %19s ...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM