简体   繁体   English

如何在C中读取具有不同格式的多行TXT文件?

[英]How can I read a multi-line TXT file with varying formats in C?

I have an input text file which looks something like this: 我有一个输入文本文件,看起来像这样:

1(1.230000e+00)
2(1.230000e+00)
(1.230000e+00 1.230000e+00)
3(1.230000e+00)
(1.230000e+00 1.230000e+00)
.
.
.

I want to be able to read each line separately and distinguish between them. 我希望能够分别阅读每一行并区分它们。 For example, for the first line, I want to store 100 in one variable as an int and I want to store 1.230000e+00 in another variable as a double . 例如,对于第一行,我想将100作为一个int存储在一个变量中,并将1.230000e+00作为一个double存储在另一个变量中。 This is what I have tried: 这是我尝试过的:

fscanf(fp, "%d(%le)\n", &varInt, &varDouble);

This works for the first line. 这适用于第一行。 But how can I loop through and do this for all the lines AND also read the 3rd line using: 但是我该如何遍历所有行,并使用以下命令读取第三行:

fscanf(fp, "(%le %le)\n", &varDouble1, &varDouble2);

To give some context, after reading each line, I will do some processing and then read the next line. 为了提供一些上下文,在阅读每一行之后,我将进行一些处理,然后阅读下一行。 Depending on the format of the line, I will do different type of processing. 根据行的格式,我将执行不同类型的处理。

Any help is appreciated! 任何帮助表示赞赏! Thank you! 谢谢!

fscanf (3) is almost unusable unless the input is strictly controlled. 除非严格控制输入,否则fscanf (3)几乎不可用。 It's hard to distinguish between I/O errors and parsing errors. 很难区分I / O错误和解析错误。 That's why it's much more common to read each line with fgets (3), and then scan it with sscanf (3). 这就是为什么使用fgets (3)读取每一行,然后使用sscanf (3)对其进行扫描更为常见的原因。

Because sscanf returns the number of elements parsed, you can use that to determine if a scan works as expected. 由于sscanf返回已解析的元素数,因此可以使用它来确定扫描是否按预期进行。 No need to peek at the input: if you got what you expected, you're done, else try scanning some other way. 无需窥视输入:如果您达到了预期,就完成了,否则尝试以其他方式进行扫描。 Here's a working example: 这是一个工作示例:

#include <assert.h>
#include <err.h>
#include <stdio.h>
#include <stdlib.h>

int
main( int argc, char *argv[] ) {
  if( argc < 2 ) {
    errx(EXIT_FAILURE, "syntax: %s filename", argv[0]);
  }

  FILE *input = fopen(argv[1], "r");
  if( !input ) {
    err(EXIT_FAILURE, "could not open '%s'", argv[0]);
  }

  static char line[128];
  int n;

  while( fgets(line, sizeof(line), input) != NULL ) {
    double d1, d2;
    int quantum;

    if( 2 == sscanf(line, "%d(%lf)", &quantum, &d1) ) {
      printf( "ok: %d\t%7.2f\n", 100 * quantum, d1 );
    } else if( 2 == sscanf(line, "(%lf %lf)", &d1, &d2) ) {
      printf( "ok: %7.2f\t%7.2f\n", d1, d2 );
    } else {
      printf( ">>> %s\n", line );
    }

  }

  if( !feof(input) ) {
    err(EXIT_FAILURE, "error reading %s", argv[1]);
  }

  return EXIT_SUCCESS;
}

If you discover other patterns, it's easy to add them. 如果您发现其他模式,则可以轻松添加它们。 Note that when fgets fails, the program returns success only if we reached end of file. 请注意,当fgets失败时,仅当我们到达文件末尾时,程序才会返回成功。

As mentioned in the comments, you can read a full line and then determine the format of the line and parse the line accordingly. 如评论中所述,您可以阅读整行,然后确定该行的格式并相应地解析该行。 The following code does what you want. 以下代码可以满足您的需求。 However, production worthy code would probably more robustly interpret the format of each line. 但是,有价值的代码可能会更可靠地解释每行的格式。

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

char *GetStringNoBeginWhitespace( char *str )
{
    static const char whitespace[] = " \f\n\r\t\v";

    return(str + strspn(str, whitespace));
}

int main(int argc, char *argv[])
{
    char *line = NULL;
    FILE *fp;
    char buffer[255];

    int i;
    double d,d1;

    fp = fopen("data.txt", "r");

    while(fgets(buffer, 255, fp))
    {
        buffer[strlen(buffer)-1] = 0x00;
        line = GetStringNoBeginWhitespace( buffer );
        if( line )
        {
            fputs(line, stdout);
            if( isdigit((int)line[0] ))
            {
                printf("\tFormat is x(.......)\n");
                if( sscanf(line,"%d(%le)\n", &i, &d) == 2 )
                {
                    printf("      %d %le\n", i, d);
                }
                else
                {
                    printf("\tUnknown format....\n");
                }
            }
            else if( line[0] == '(' )
            {
                printf("\tFormat is ( ...... ....... )\n");
                if( sscanf(line, "(%le %le)\n", &d, &d1) == 2 )
                {
                    printf("      %le %le\n", d, d1);
                }
                else
                {
                    printf("\tUnknown format....\n");
                }
            }
            else
            {
                printf("\tUnknown format....\n");
            }
        }
    }

    fclose(fp); 
    return(0);
}

Output: 输出:

jnorton@ubuntu:~/source$ ./a.out 
1(1.230000e+00) Format is x(.......)
      1 1.230000e+00
2(1.230000e+00) Format is x(.......)
      2 1.230000e+00
(1.230000e+00 1.230000e+00) Format is ( ...... ....... )
      1.230000e+00 1.230000e+00
3(1.230000e+00) Format is x(.......)
      3 1.230000e+00
(1.230000e+00 1.230000e+00) Format is ( ...... ....... )
      1.230000e+00 1.230000e+00

data.txt file: data.txt文件:

1(1.230000e+00)
2(1.230000e+00)
(1.230000e+00 1.230000e+00)
3(1.230000e+00)
(1.230000e+00 1.230000e+00)

If you generate the file, add fixed size prefix for the length of line. 如果生成文件,请为行长添加固定大小的前缀。 Eg 016:1(1.230000e+00)\\n . 例如016:1(1.230000e+00)\\n Then read 4 bytes with fread , convert string to int with strtol and read rest of line ( \\n is included in length). 然后使用fread读取4个字节,使用strtol将字符串转换为int并读取其余行(长度包含\\n )。 Finally split the values with strtok( str, "( )" ) . 最后用strtok( str, "( )" )拆分值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM