简体   繁体   English

阅读文本文件 C 编程中的下一行

[英]Read next line in Text file C Programming

I know about fscanf(), fgets() and those other functions to read the next line of a text file.我知道fscanf()、fgets()和其他用于读取文本文件下一行的函数。 However, if you are given a text file by 'cat msg1.txt |但是,如果您通过'cat msg1.txt |获得一个文本文件./anonymizer' would you use the same functions? ./anonymizer'你会使用相同的功能吗? For my program the code for the main is:对于我的程序,主要代码是:

int main (void)
{
    char input[1000]= {'\0'}; //the sentence the user will enter
    printf("Enter a sentence:");
    scanf("%[^\n]", input);
    char newSentence[1000]={'\0'};
    sentence=(char *) &newSentence;
    line=getText(input,0);
    divide(input);
    printf("%s\n",sentence);
    return 0;
}

In the command line I enter:在命令行中我输入:

gcc -o anonymizer anonymizer.c
cat msg1.txt | ./anonymizer

My msg1 text file contains:我的 msg1 文本文件包含:

Hi, my email addresses are h.potter@hogwarts.edu and 1a@2b3c@lkj@ Although it's not an email addresses, I'd hate if@ you saw my secret@word.嗨,我的电子邮件地址是 h.potter@hogwarts.edu 和 1a@2b3c@lkj@ 虽然它不是电子邮件地址,但我不喜欢如果@你看到我的秘密@word。 Gary.zenkel@nbcuni.comHoever, input variable only contains the first line: 'Hi, my email addresses are h.potter@hogwarts.edu and 1a@2b3c@lkj@' Gary.zenkel@nbcuni.comHoever,输入变量只包含第一行:'嗨,我的电子邮件地址是 h.potter@hogwarts.edu 和 1a@2b3c@lkj@'

How can I get the input variable to contain the other two lines?如何让输入变量包含其他两行?

Almost.几乎。 While it may not actually be defined in such a way, scanf(...) is essentially equivalent to fscanf(stdin, ...) .虽然它实际上可能不是以这种方式定义的,但scanf(...)本质上等同于fscanf(stdin, ...) Similar for gets / fgets .类似于gets / fgets You should be able to use either to read from your standard input stream.您应该能够使用其中任何一个来读取您的标准输入流。

To my limited knowledge (I could be wrong), with the standard libc, there are no efficient ways to read a line when you do not know the max line length.据我有限的知识(我可能是错的),使用标准 libc,当您不知道最大行长度时,没有有效的方法来读取一行。 You may get memory overflow with scanf() and gets() because they do not check the length of your buffer. scanf()gets()可能会导致内存溢出,因为它们不检查缓冲区的长度。 If you use fgets() , you may waste time on frequent strlen() and realloc() .如果您使用fgets() ,您可能会在频繁使用strlen()realloc()上浪费时间。 If you use fgetc() , it will be slow as fgetc() has a huge overhead.如果您使用fgetc() ,它会很慢,因为fgetc()有巨大的开销。

For efficient line reading, we have to keep some intermediate information.为了有效地读取行,我们必须保留一些中间信息。 It is not that easy.这并不容易。 I am attaching an implementation.我正在附上一个实现。 It is quite complicated, but it is very efficient and generic.它相当复杂,但非常高效和通用。 If you do not care about the details, you may just focus on the main() function about how to use the routines.如果你不关心细节,你可能只关注main()函数关于如何使用例程。

To try this program:试试这个程序:

gcc -Wall prog.c; ./a.out < input.txt > output.txt

Program:程序:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#ifndef kroundup32
#define kroundup32(x) (--(x), (x)|=(x)>>1, (x)|=(x)>>2, (x)|=(x)>>4, (x)|=(x)>>8, (x)|=(x)>>16, ++(x))
#endif

#define kstype_t FILE* // type of file handler
#define ksread_f(fp, buf, len) fread((buf), 1, (len), (fp)) // function to read a data chunk

typedef struct {
    int l, m; // l: length of string; m: allocated size
    char *s; // string
} kstring_t;

typedef struct {
    kstype_t f; // file handler
    int begin, end, is_eof, bufsize;
    unsigned char *buf; // buffer
} kstream_t;

kstream_t *ks_open(kstype_t fp, int bufsize)
{
    kstream_t *ks;
    ks = (kstream_t*)calloc(1, sizeof(kstream_t));
    ks->bufsize = bufsize;
    ks->buf = (unsigned char*)malloc(bufsize);
    ks->f = fp;
    return ks;
}

void ks_close(kstream_t *ks)
{
    free(ks->buf); free(ks);
}

int ks_readline(kstream_t *ks, int delimiter, kstring_t *str)
{
    str->l = 0;
    if (ks->begin >= ks->end && ks->is_eof) return -1;
    for (;;) {
        int i;
        if (ks->begin >= ks->end) {
            if (!ks->is_eof) {
                ks->begin = 0;
                ks->end = ksread_f(ks->f, ks->buf, ks->bufsize);
                if (ks->end < ks->bufsize) ks->is_eof = 1;
                if (ks->end == 0) break;
            } else break;
        }
        for (i = ks->begin; i < ks->end; ++i)
            if (ks->buf[i] == delimiter) break;
        if (str->m - str->l < i - ks->begin + 1) {
            str->m = str->l + (i - ks->begin) + 1;
            kroundup32(str->m);
            str->s = (char*)realloc(str->s, str->m);
        }
        memcpy(str->s + str->l, ks->buf + ks->begin, i - ks->begin);
        str->l = str->l + (i - ks->begin);
        ks->begin = i + 1;
        if (i < ks->end) break;
    }
    if (str->s == 0) {
        str->m = 1;
        str->s = (char*)calloc(1, 1);
    }
    str->s[str->l] = '\0';
    return str->l;
}

int main()
{
    kstream_t *ks;
    kstring_t str;
    str.l = str.m = 0; str.s = 0; // initialize the string struct
    ks = ks_open(stdin, 4096); // initialize the file handler
    while (ks_readline(ks, '\n', &str) >= 0) // read each line
        puts(str.s); // print it out
    ks_close(ks); free(str.s); // free
    return 0;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM