简体   繁体   English

使用fscanf读取字符串和空行

[英]Use fscanf to read strings and empty lines

I have a text file containing keywords and integers and have access to the file stream in order to parse this file. 我有一个包含关键字和整数的文本文件,可以访问文件流以解析此文件。

I am able to parse it by doing while( fscanf(stream, "%s", word) != -1 ) which gets each word and int in the file for me to parse, but the problem I'm having is that I cannot detect an empty line "\\n" which then I need to detect for something. 我能够通过执行while( fscanf(stream, "%s", word) != -1 )来解析它,它获取文件中的每个单词和整数供我解析,但是我遇到的问题是我无法检测到空行“ \\ n”,那么我需要检测到某行。 I can see that \\n is a character thus not detected by %s. 我可以看到\\ n是一个字符,因此%s无法检测到。 What can I do to modify fscanf to also get EOL characters? 我该怎么做才能修改fscanf以获得EOL字符?

You can do exactly what it is you wish to do with fscanf , but the number of checks and validations required to do it properly, and completely is just painful compared to using a proper line oriented input function like fgets . 您可以使用fscanf完全完成您想做的事情,但是与使用诸如fgets这样的面向行的输入函数相比,正确且完全地完成它所需的检查和验证次数非常痛苦。

With fgets (or POSIX getline ) detecting an empty line requires nothing special, or in addition to, reading a normal line. 使用fgets (或POSIX getline )来检测空行不需要任何特殊的操作,或者除了读取普通行之外,不需要任何其他操作。 For example, to read a line of text with fgets , you simply provide a buffer of sufficient size and make a single call to read up to and including the '\\n' into buf : 例如,要读取带有fgets的文本行,您只需提供足够大小的缓冲区,并进行一次调用即可将'\\n'读入buf并将其包括在其中:

while (fgets (buf, BUFSZ, fp)) {        /* read each line in file */

To check whether the line was an empty-line, you simply check if the first character in buf is the '\\n' char, eg 要检查行是否为空行,只需检查buf的第一个字符是否为'\\n'字符,例如

    if (*buf == '\n')
        /* handle blank line */

or, in the normal course of things, you will be removing the trailing ' \\n' by obtaining the length and overwriting the '\\n' with the nul-terminating character. 或者,在正常情况下,您将通过获取长度并以'\\n' 终止字符覆盖'\\n'来删除尾随的' \\n' In which case, you can simply check if length is 0 (after removal), eg 在这种情况下,您可以简单地检查长度是否为0 (移除后),例如

    size_t len = strlen (buf);          /* get buf length */
    if (len && buf[len-1] == '\n')      /* check last char is '\n' */
        buf[--len] = 0;                 /* overwrite with nul-character */

( note: if the last character was not '\\n' , you know the line was longer than the buffer and characters in the line remain unread -- and will be read on the next call to fgets , or you have reached the end of the file with a non-POSIX line ending on the last line) 注意:如果最后一个字符不是'\\n' ,则您知道该行比缓冲区长,并且该行中的字符仍未读取-并将在下一次调用fgets读取,否则您将到达该字符的末尾在最后一行以非POSIX行结尾的文件)

Putting it altogether, an example using fgets identifying empty lines, and providing for printing complete lines even if the line exceeds the buffer length, you could do something like the following: 综上所述,一个使用fgets识别空行并提供打印完整行的示例,即使该行超出缓冲区长度,您也可以执行以下操作:

#include <stdio.h>
#include <string.h>

#define BUFSZ 4096

int main (int argc, char **argv) {

    size_t n = 1;
    char buf[BUFSZ] = "";
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) {  /* validate file open for reading */
        fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
        return 1;
    }

    while (fgets (buf, BUFSZ, fp)) {        /* read each line in file */
        size_t len = strlen (buf);          /* get buf length */
        if (len && buf[len-1] == '\n')      /* check last char is '\n' */
            buf[--len] = 0;                 /* overwrite with nul-character */
        else {   /* line too long or non-POSIX file end, handle as required */
            printf ("line[%2zu] : %s\n", n, buf);
            continue;
        }   /* output line (or "empty" if line was empty) */
        printf ("line[%2zu] : %s\n", n++, len ? buf : "empty");
    }
    if (fp != stdin) fclose (fp);           /* close file if not stdin */

    return 0;
}

Example Input File 输入文件示例

$ cat ../dat/captnjack2.txt
This is a tale

Of Captain Jack Sparrow

A Pirate So Brave

On the Seven Seas.

Example Use/Output 使用/输出示例

$ ./bin/fgetsblankln ../dat/captnjack2.txt
line[ 1] : This is a tale
line[ 2] : empty
line[ 3] : Of Captain Jack Sparrow
line[ 4] : empty
line[ 5] : A Pirate So Brave
line[ 6] : empty
line[ 7] : On the Seven Seas.

So Why Does Everybody Recommend fgets ? 那么,为什么每个人都推荐fgets

Well, let's take a look at doing the same thing with fscanf and I'll let you be the judge. 好吧,让我们看一下用fscanf做同样的事情,我会让你当法官。 To begin with, fscanf does not read or include the trailing '\\n' with the "%s" format specifier (by default) or when using the character class "%[^\\n]" (because it was specifically excluded). 首先, fscanf不会读取或在尾部的'\\n'包含"%s" 格式说明符 (默认情况下)或使用字符类 "%[^\\n]" (因为它已被特别排除)。 So you do not have the ability to read a (1) line with characters and (2) line without characters using the same format string . 因此,您无法使用相同的格式字符串读取(1)包含字符的行和(2)不含字符的行。 You either read characters and fscanf succeeds, or you don't and you experience a matching failure . 您要么读取字符,然后fscanf成功,要么您不成功,并且遇到匹配失败

So as alluded to in the comments, you have to pre-check if the next character in the input buffer is a '\\n' character using fgetc (or getc ) and then put it back in the input buffer with ungetc if it isn't. 因此,正如注释中提到的那样,您必须使用fgetc (或getc预先检查输入缓冲区中的下一个字符是否为'\\n'字符,然后使用ungetc将其放回输入缓冲区中(如果不是)。吨。

Further adding to your fscanf task, you must independently validate each check, put back, and read every step along the way. 进一步添加到fscanf任务中,您必须独立验证每个检查,放回和阅读过程中的每个步骤。 This results in quite a number of checks to handle all cases and provide all checks necessary to avoid undefined behavior . 这导致大量检查处理所有情况,并提供所有必要的检查以避免未定义的行为

As part of those checks you will need to limit the number of characters you read to one less-than the number of characters in the buffer while capturing the next character to determine if the line was too long to fit. 作为这些检查的一部分,您需要在读取下一个字符以确定行是否太长而无法容纳时,将读取的字符数限制为小于缓冲区中字符数。 Additional checks are required to handle (without failure) a file with a non-POSIX line end on the final line -- something handled without issue by fgets . 需要额外检查以处理(无失败)最后一行非POSIX行结尾的文件fgets处理了没有问题的文件。

Below is a similar implementation to the fgets code above. 下面是与上面的fgets代码类似的实现。 Go through and understand why each step it necessary and what each validation prevents against. 仔细了解为什么需要执行每个步骤以及每个验证所要防止的事情。 You may be able to rearrange slightly, but it has been whittled down to close to the bare minimum. 您也许可以稍微重新排列一下,但是它已经减少到接近最低限度。 After going though it, it should become clear why fgets is the preferred method for handling checks for empty lines (as well as for line oriented input, generally) 经过它之后,应该清楚为什么fgets是处理空行检查的首选方法(以及通常用于输入的检查)

#include <stdio.h>

#define BUFSZ 4096

int main (int argc, char **argv) {

    int c = 0, r = 0;
    size_t n = 1;
    char buf[BUFSZ] = "", nl = 0;
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) {  /* validate file open for reading */
        fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
        return 1;
    }

    for (;;) {  /* loop until EOF */
        if ((c = fgetc (fp)) == '\n')   /* check next char is '\n' */
            *buf = 0;                   /* make buf empty-string */
        else {
            if (c == EOF)               /* check if EOF */
                break;
            if (ungetc (c, fp) == EOF) {    /* ungetc/validate */
                fprintf (stderr, "error: ungetc failed.\n");
                break;
            }
            /* read line into buf and '\n' into nl, handle failure */
            if ((r = fscanf (fp, "%4095[^\n]%c", buf, &nl)) != 2) {
                if (r == EOF) {         /* EOF (input failure) */
                    break;
                } /* check next char, if not EOF, non-POSIX eol */
                else if ((c = fgetc (fp)) != EOF) {
                    if (ungetc (c, fp) == EOF) {    /* unget it */
                        fprintf (stderr, "error: ungetc failed.\n");
                        break;
                    } /* read line again handling non-POSIX eol */
                    if (fscanf (fp, "%4095[^\n]", buf) != 1) {
                        fprintf (stderr, "error: fscanf failed.\n");
                        break;
                    }
                }
            } /* good fscanf, validate nl = '\n' or line to long */
            else if (nl != '\n') {
                fprintf (stderr, "error: line %zu too long.\n", n);
                break;
            }
        } /* output line (or "empty" for empty line) */
        printf ("line[%2zu] : %s\n", n++, *buf ? buf : "empty");
    }

    if (fp != stdin) fclose (fp);     /* close file if not stdin */

    return 0;
}

The Use/Output is identical to above. 使用/输出与上面相同。 Look things over and let me know if you have any further questions. 仔细检查一下,如果您还有其他问题,请与我联系。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM