[英]Using fgets and strtok() to read a text file -C
我正在嘗試使用 fgets() 逐行讀取 stdin 中的文本並將文本存儲在變量“text”中。 但是,當我使用 strtok() 拆分單詞時,它在終止之前僅適用於幾行。 我應該更改什么以使其貫穿整個文本?
#define WORD_BUFFER_SIZE 50
#define TEXT_SIZE 200
int main(void) {
char stopWords[TEXT_SIZE][WORD_BUFFER_SIZE];
char word[WORD_BUFFER_SIZE];
int numberOfWords = 0;
while(scanf("%s", word) == 1){
if (strcmp(word, "====") == 0){
break;
}
strcpy(stopWords[numberOfWords], word);
numberOfWords++;
}
char *buffer = malloc(sizeof(WORD_BUFFER_SIZE)*TEXT_SIZE);
char *text = malloc(sizeof(WORD_BUFFER_SIZE)*TEXT_SIZE);
while(fgets(buffer, WORD_BUFFER_SIZE*TEXT_SIZE, stdin) != NULL){
strcat(text, buffer);
}
char *k;
k = strtok(text, " ");
while (k != NULL) {
printf("%s\n", k);
k = strtok(NULL, " ");
}
}
char *buffer = malloc(sizeof(WORD_BUFFER_SIZE)*TEXT_SIZE);
char *text = malloc(sizeof(WORD_BUFFER_SIZE)*TEXT_SIZE);
sizeof(WORD_BUFFER_SIZE)
是一個常量,它是整數的大小。 您的意思可能是WORD_BUFFER_SIZE * TEXT_SIZE
。 但是您可以找到文件大小並准確計算您需要多少內存。
char *text = malloc(...)
strcat(text, buffer);
text
未初始化且沒有空終止符。 strcat
需要知道text
的結尾。 您必須在使用strcat
之前設置text[0] = '\\0'
(它不像strcpy
)
int main(void)
{
fseek(stdin, 0, SEEK_END);
size_t filesize = ftell(stdin);
rewind(stdin);
if (filesize == 0)
{ printf("not using a file!\n"); return 0; }
char word[1000] = { 0 };
//while (scanf("%s", word) != 1)
// if (strcmp(word, "====") == 0)
// break;
char* text = malloc(filesize + 1);
if (!text)
return 0;
text[0] = '\0';
while (fgets(word, sizeof(word), stdin) != NULL)
strcat(text, word);
char* k;
k = strtok(text, " ");
while (k != NULL)
{
printf("%s\n", k);
k = strtok(NULL, " ");
}
return 0;
}
根據您在評論部分提供的信息,輸入文本長度超過 800 字節。
然而,在行
char *text = malloc(sizeof(WORD_BUFFER_SIZE)*TEXT_SIZE);
這相當於
char *text = malloc(800);
您只分配了 800 個字節作為text
存儲空間。 因此,您沒有分配足夠的空間來將整個輸入存儲到text
。 嘗試存儲超過 800 個字節將導致緩沖區溢出,從而引發未定義的行為。
如果要將整個輸入存儲到text
,則必須確保它足夠大。
但是,這可能不是必需的。 根據您的要求,一次處理一行可能就足夠了,如下所示:
while( fgets( buffer, sizeof buffer, stdin ) != NULL )
{
char *k = strtok( buffer, " " );
while ( k != NULL )
{
printf( "%s\n", k );
k = strtok( NULL, " " );
}
}
在這種情況下,您不需要數組text
。 您只需要數組buffer
來存儲行的當前內容。
由於您沒有提供任何示例輸入,我無法測試上面的代碼。
編輯:根據您對此答案的評論,您的主要問題似乎是當您事先不知道輸入的長度時,如何從stdin
讀取所有輸入並將其存儲為字符串。
一種常見的解決方案是分配一個初始緩沖區,並在每次緩沖區滿時將其大小加倍。 您可以為此使用函數realloc
:
#include <stdio.h>
#include <stdlib.h>
int main( void )
{
char *buffer;
size_t buffer_size = 1024;
size_t input_size = 0;
//allocate initial buffer
buffer = malloc( buffer_size );
if ( buffer == NULL )
{
fprintf( stderr, "allocation error!\n" );
exit( EXIT_FAILURE );
}
//continuously fill the buffer with input, and
//grow buffer as necessary
for (;;) //infinite loop, equivalent to while(1)
{
//we must leave room for the terminating null character
size_t to_read = buffer_size - input_size - 1;
size_t ret;
ret = fread( buffer + input_size, 1, to_read, stdin );
input_size += ret;
if ( ret != to_read )
{
//we have finished reading from input
break;
}
//buffer was filled entirely (except for the space
//reserved for the terminating null character), so
//we must grow the buffer
{
void *temp;
buffer_size *= 2;
temp = realloc( buffer, buffer_size );
if ( temp == NULL )
{
fprintf( stderr, "allocation error!\n" );
exit( EXIT_FAILURE );
}
buffer = temp;
}
}
//make sure that `fread` did not fail end due to
//error (it should only end due to end-of-file)
if ( ferror(stdin) )
{
fprintf( stderr, "input error!\n" );
exit( EXIT_FAILURE );
}
//add terminating null character
buffer[input_size++] = '\0';
//shrink buffer to required size
{
void *temp;
temp = realloc( buffer, input_size );
if ( temp == NULL )
{
fprintf( stderr, "allocation error!\n" );
exit( EXIT_FAILURE );
}
buffer = temp;
}
//the entire contents is now stored in "buffer" as a
//string, and can be printed
printf( "contents of buffer:\n%s\n", buffer );
free( buffer );
}
上面的代碼假設輸入將因文件結束條件而終止,如果輸入是從文件中通過管道傳輸的,則可能就是這種情況。
再想一想,不是像您在代碼中所做的那樣為整個文件使用一個大字符串,而是希望將char*
數組用於各個字符串,每個字符串都代表一行,例如lines[0]
將是第一行的字符串, lines[1]
將是第二行的字符串。 這樣,您可以輕松地使用strstr
查找每一行上的“ ==== ” strchr
和strchr
以查找單個單詞,並且仍然在內存中保留所有行以供進一步處理。
在這種情況下,我不建議您使用strtok
,因為該函數通過用空字符替換分隔符來修改字符串,因此它具有破壞性。 如果您需要進一步處理字符串,如您在評論部分所述,那么這可能不是您想要的。 這就是我建議您改用strchr
。
如果在編譯時已知合理的最大行數,那么解決方案就很簡單:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAX_LINE_LENGTH 1024
#define MAX_LINES 1024
int main( void )
{
char *lines[MAX_LINES];
int num_lines = 0;
char buffer[MAX_LINE_LENGTH];
//read one line per loop iteration
while ( fgets( buffer, sizeof buffer, stdin ) != NULL )
{
int line_length = strlen( buffer );
//verify that entire line was read in
if ( buffer[line_length-1] != '\n' )
{
//treat end-of file as equivalent to newline character
if ( !feof( stdin ) )
{
fprintf( stderr, "input line exceeds maximum line length!\n" );
exit( EXIT_FAILURE );
}
}
else
{
//remove newline character from string
buffer[--line_length] = '\0';
}
//allocate memory for new string and add to array
lines[num_lines] = malloc( line_length + 1 );
//verify that "malloc" succeeded
if ( lines[num_lines] == NULL )
{
fprintf( stderr, "allocation error!\n" );
exit( EXIT_FAILURE );
}
//copy line to newly allocated buffer
strcpy( lines[num_lines], buffer );
//increment counter
num_lines++;
}
//All input lines have now been successfully read in, so
//we can now do something with them.
//handle one line per loop iteration
for ( int i = 0; i < num_lines; i++ )
{
char *p, *q;
//attempt to find the " ==== " marker
p = strstr( lines[i], " ==== " );
if ( p == NULL )
{
printf( "Warning: skipping line because unable to find \" ==== \".\n" );
continue;
}
//skip the " ==== " marker
p += 6;
//split tokens on remainder of line using "strchr"
while ( ( q = strchr( p, ' ') ) != NULL )
{
printf( "found token: %.*s\n", (int)(q-p), p );
p = q + 1;
}
//output last token
printf( "found token: %s\n", p );
}
//cleanup allocated memory
for ( int i = 0; i < num_lines; i++ )
{
free( lines[i] );
}
}
當使用以下輸入運行上面的程序時
first line before deliminator ==== first line after deliminator
second line before deliminator ==== second line after deliminator
它有以下輸出:
found token: first
found token: line
found token: after
found token: deliminator
found token: second
found token: line
found token: after
found token: deliminator
但是,如果在編譯時沒有已知的合理最大行數,則數組lines
也必須設計為以與前一個程序中的buffer
類似的方式增長。 這同樣適用於最大線路長度。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.