簡體   English   中英

使用 fgets 和 strtok() 讀取文本文件 -C

[英]Using fgets and strtok() to read a text file -C

我正在嘗試使用 fgets() 逐行讀取 stdin 中的文本並將文本存儲在變量“text”中。 但是,當我使用 strtok() 拆分單詞時,它在終止之前僅適用於幾行。 我應該更改什么以使其貫穿整個文本?


#define WORD_BUFFER_SIZE 50
#define TEXT_SIZE 200

int main(void) {
    char stopWords[TEXT_SIZE][WORD_BUFFER_SIZE];
    char word[WORD_BUFFER_SIZE];
    int numberOfWords = 0;
  
    while(scanf("%s", word) == 1){
      if (strcmp(word, "====") == 0){
        break;
      }
      strcpy(stopWords[numberOfWords], word);
      numberOfWords++;
    }

  char *buffer = malloc(sizeof(WORD_BUFFER_SIZE)*TEXT_SIZE);
  char *text = malloc(sizeof(WORD_BUFFER_SIZE)*TEXT_SIZE);
  
  while(fgets(buffer, WORD_BUFFER_SIZE*TEXT_SIZE, stdin) != NULL){  
    strcat(text, buffer);
  }
  
  char *k;
  k = strtok(text, " ");
  while (k != NULL) {
    printf("%s\n", k);
    k = strtok(NULL, " ");
  }
  
}

char *buffer = malloc(sizeof(WORD_BUFFER_SIZE)*TEXT_SIZE);
char *text = malloc(sizeof(WORD_BUFFER_SIZE)*TEXT_SIZE);

sizeof(WORD_BUFFER_SIZE)是一個常量,它是整數的大小。 您的意思可能是WORD_BUFFER_SIZE * TEXT_SIZE 但是您可以找到文件大小並准確計算您需要多少內存。

char *text = malloc(...)
strcat(text, buffer);

text未初始化且沒有空終止符。 strcat需要知道text的結尾。 您必須在使用strcat之前設置text[0] = '\\0' (它不像strcpy

int main(void) 
{
    fseek(stdin, 0, SEEK_END);
    size_t filesize = ftell(stdin);
    rewind(stdin);
    if (filesize == 0)
    { printf("not using a file!\n"); return 0; }

    char word[1000] = { 0 };

    //while (scanf("%s", word) != 1)
    //    if (strcmp(word, "====") == 0)
    //        break;

    char* text = malloc(filesize + 1);
    if (!text)
        return 0;
    text[0] = '\0';
    while (fgets(word, sizeof(word), stdin) != NULL)
        strcat(text, word);

    char* k;
    k = strtok(text, " ");
    while (k != NULL) 
    {
        printf("%s\n", k);
        k = strtok(NULL, " ");
    }

    return 0;
}

根據您在評論部分提供的信息,輸入文本長度超過 800 字節。

然而,在行

char *text = malloc(sizeof(WORD_BUFFER_SIZE)*TEXT_SIZE);

這相當於

char *text = malloc(800);

您只分配了 800 個字節作為text存儲空間。 因此,您沒有分配足夠的空間來將整個輸入存儲到text 嘗試存儲超過 800 個字節將導致緩沖區溢出,從而引發未定義的行為

如果要將整個輸入存儲到text ,則必須確保它足夠大。

但是,這可能不是必需的。 根據您的要求,一次處理一行可能就足夠了,如下所示:

while( fgets( buffer, sizeof buffer, stdin ) != NULL )
{
    char *k = strtok( buffer, " " );

    while ( k != NULL )
    {
        printf( "%s\n", k );
        k = strtok( NULL, " " );
    }
}

在這種情況下,您不需要數組text 您只需要數組buffer來存儲行的當前內容。

由於您沒有提供任何示例輸入,我無法測試上面的代碼。


編輯:根據您對此答案的評論,您的主要問題似乎是當您事先不知道輸入的長度時,如何從stdin讀取所有輸入並將其存儲為字符串。

一種常見的解決方案是分配一個初始緩沖區,並在每次緩沖區滿時將其大小加倍。 您可以為此使用函數realloc

#include <stdio.h>
#include <stdlib.h>

int main( void )
{
    char *buffer;
    size_t buffer_size = 1024;
    size_t input_size = 0;

    //allocate initial buffer
    buffer = malloc( buffer_size );
    if ( buffer == NULL )
    {
        fprintf( stderr, "allocation error!\n" );
        exit( EXIT_FAILURE );
    }

    //continuously fill the buffer with input, and
    //grow buffer as necessary
    for (;;) //infinite loop, equivalent to while(1)
    {
        //we must leave room for the terminating null character
        size_t to_read = buffer_size - input_size - 1;
        size_t ret;

        ret = fread( buffer + input_size, 1, to_read, stdin );

        input_size += ret;

        if ( ret != to_read )
        {
            //we have finished reading from input
            break;
        }

        //buffer was filled entirely (except for the space
        //reserved for the terminating null character), so
        //we must grow the buffer
        {
            void *temp;

            buffer_size *= 2;
            temp = realloc( buffer, buffer_size );

            if ( temp == NULL )
            {
                fprintf( stderr, "allocation error!\n" );
                exit( EXIT_FAILURE );
            }

            buffer = temp;
        }
    }

    //make sure that `fread` did not fail end due to
    //error (it should only end due to end-of-file)
    if ( ferror(stdin) )
    {
        fprintf( stderr, "input error!\n" );
        exit( EXIT_FAILURE );
    }

    //add terminating null character
    buffer[input_size++] = '\0';

    //shrink buffer to required size
    {
        void *temp;

        temp = realloc( buffer, input_size );

        if ( temp == NULL )
        {
            fprintf( stderr, "allocation error!\n" );
            exit( EXIT_FAILURE );
        }

        buffer = temp;
    }

    //the entire contents is now stored in "buffer" as a
    //string, and can be printed
    printf( "contents of buffer:\n%s\n", buffer );

    free( buffer );
}

上面的代碼假設輸入將因文件結束條件而終止,如果輸入是從文件中通過管道傳輸的,則可能就是這種情況。

再想一想,不是像您在代碼中所做的那樣為整個文件使用一個大字符串,而是希望將char*數組用於各個字符串,每個字符串都代表一行,例如lines[0]將是第一行的字符串, lines[1]將是第二行的字符串。 這樣,您可以輕松地使用strstr查找每一行上的“ ==== ” strchrstrchr以查找單個單詞,並且仍然在內存中保留所有行以供進一步處理。

在這種情況下,我不建議您使用strtok ,因為該函數通過用空字符替換分隔符來修改字符串,因此它具有破壞性。 如果您需要進一步處理字符串,如您在評論部分所述,那么這可能不是您想要的。 這就是我建議您改用strchr

如果在編譯時已知合理的最大行數,那么解決方案就很簡單:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_LINE_LENGTH 1024
#define MAX_LINES 1024

int main( void )
{
    char *lines[MAX_LINES];
    int num_lines = 0;

    char buffer[MAX_LINE_LENGTH];

    //read one line per loop iteration
    while ( fgets( buffer, sizeof buffer, stdin ) != NULL )
    {
        int line_length = strlen( buffer );

        //verify that entire line was read in
        if ( buffer[line_length-1] != '\n' )
        {
            //treat end-of file as equivalent to newline character
            if ( !feof( stdin ) )
            {
                fprintf( stderr, "input line exceeds maximum line length!\n" );
                exit( EXIT_FAILURE );
            }
        }
        else
        {
            //remove newline character from string
            buffer[--line_length] = '\0';
        }

        //allocate memory for new string and add to array
        lines[num_lines] = malloc( line_length + 1 );

        //verify that "malloc" succeeded
        if ( lines[num_lines] == NULL )
        {
            fprintf( stderr, "allocation error!\n" );
            exit( EXIT_FAILURE );
        }

        //copy line to newly allocated buffer
        strcpy( lines[num_lines], buffer );

        //increment counter
        num_lines++;
    }

    //All input lines have now been successfully read in, so
    //we can now do something with them.

    //handle one line per loop iteration
    for ( int i = 0; i < num_lines; i++ )
    {
        char *p, *q;

        //attempt to find the " ==== " marker
        p = strstr( lines[i], " ==== " );
        if ( p == NULL )
        {
            printf( "Warning: skipping line because unable to find \" ==== \".\n" );
            continue;
        }

        //skip the " ==== " marker
        p += 6;

        //split tokens on remainder of line using "strchr"
        while ( ( q = strchr( p, ' ') ) != NULL )
        {
            printf( "found token: %.*s\n", (int)(q-p), p );
            p = q + 1;
        }

        //output last token
        printf( "found token: %s\n", p );
    }

    //cleanup allocated memory
    for ( int i = 0; i < num_lines; i++ )
    {
        free( lines[i] );
    }
}

當使用以下輸入運行上面的程序時

first line before deliminator ==== first line after deliminator
second line before deliminator ==== second line after deliminator

它有以下輸出:

found token: first
found token: line
found token: after
found token: deliminator
found token: second
found token: line
found token: after
found token: deliminator

但是,如果在編譯時沒有已知的合理最大行數,則數組lines也必須設計為以與前一個程序中的buffer類似的方式增長。 這同樣適用於最大線路長度。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM