[英]Using fgets and strtok() to read a text file -C
我正在尝试使用 fgets() 逐行读取 stdin 中的文本并将文本存储在变量“text”中。 但是,当我使用 strtok() 拆分单词时,它在终止之前仅适用于几行。 我应该更改什么以使其贯穿整个文本?
#define WORD_BUFFER_SIZE 50
#define TEXT_SIZE 200
int main(void) {
char stopWords[TEXT_SIZE][WORD_BUFFER_SIZE];
char word[WORD_BUFFER_SIZE];
int numberOfWords = 0;
while(scanf("%s", word) == 1){
if (strcmp(word, "====") == 0){
break;
}
strcpy(stopWords[numberOfWords], word);
numberOfWords++;
}
char *buffer = malloc(sizeof(WORD_BUFFER_SIZE)*TEXT_SIZE);
char *text = malloc(sizeof(WORD_BUFFER_SIZE)*TEXT_SIZE);
while(fgets(buffer, WORD_BUFFER_SIZE*TEXT_SIZE, stdin) != NULL){
strcat(text, buffer);
}
char *k;
k = strtok(text, " ");
while (k != NULL) {
printf("%s\n", k);
k = strtok(NULL, " ");
}
}
char *buffer = malloc(sizeof(WORD_BUFFER_SIZE)*TEXT_SIZE);
char *text = malloc(sizeof(WORD_BUFFER_SIZE)*TEXT_SIZE);
sizeof(WORD_BUFFER_SIZE)
是一个常量,它是整数的大小。 您的意思可能是WORD_BUFFER_SIZE * TEXT_SIZE
。 但是您可以找到文件大小并准确计算您需要多少内存。
char *text = malloc(...)
strcat(text, buffer);
text
未初始化且没有空终止符。 strcat
需要知道text
的结尾。 您必须在使用strcat
之前设置text[0] = '\\0'
(它不像strcpy
)
int main(void)
{
fseek(stdin, 0, SEEK_END);
size_t filesize = ftell(stdin);
rewind(stdin);
if (filesize == 0)
{ printf("not using a file!\n"); return 0; }
char word[1000] = { 0 };
//while (scanf("%s", word) != 1)
// if (strcmp(word, "====") == 0)
// break;
char* text = malloc(filesize + 1);
if (!text)
return 0;
text[0] = '\0';
while (fgets(word, sizeof(word), stdin) != NULL)
strcat(text, word);
char* k;
k = strtok(text, " ");
while (k != NULL)
{
printf("%s\n", k);
k = strtok(NULL, " ");
}
return 0;
}
根据您在评论部分提供的信息,输入文本长度超过 800 字节。
然而,在行
char *text = malloc(sizeof(WORD_BUFFER_SIZE)*TEXT_SIZE);
这相当于
char *text = malloc(800);
您只分配了 800 个字节作为text
存储空间。 因此,您没有分配足够的空间来将整个输入存储到text
。 尝试存储超过 800 个字节将导致缓冲区溢出,从而引发未定义的行为。
如果要将整个输入存储到text
,则必须确保它足够大。
但是,这可能不是必需的。 根据您的要求,一次处理一行可能就足够了,如下所示:
while( fgets( buffer, sizeof buffer, stdin ) != NULL )
{
char *k = strtok( buffer, " " );
while ( k != NULL )
{
printf( "%s\n", k );
k = strtok( NULL, " " );
}
}
在这种情况下,您不需要数组text
。 您只需要数组buffer
来存储行的当前内容。
由于您没有提供任何示例输入,我无法测试上面的代码。
编辑:根据您对此答案的评论,您的主要问题似乎是当您事先不知道输入的长度时,如何从stdin
读取所有输入并将其存储为字符串。
一种常见的解决方案是分配一个初始缓冲区,并在每次缓冲区满时将其大小加倍。 您可以为此使用函数realloc
:
#include <stdio.h>
#include <stdlib.h>
int main( void )
{
char *buffer;
size_t buffer_size = 1024;
size_t input_size = 0;
//allocate initial buffer
buffer = malloc( buffer_size );
if ( buffer == NULL )
{
fprintf( stderr, "allocation error!\n" );
exit( EXIT_FAILURE );
}
//continuously fill the buffer with input, and
//grow buffer as necessary
for (;;) //infinite loop, equivalent to while(1)
{
//we must leave room for the terminating null character
size_t to_read = buffer_size - input_size - 1;
size_t ret;
ret = fread( buffer + input_size, 1, to_read, stdin );
input_size += ret;
if ( ret != to_read )
{
//we have finished reading from input
break;
}
//buffer was filled entirely (except for the space
//reserved for the terminating null character), so
//we must grow the buffer
{
void *temp;
buffer_size *= 2;
temp = realloc( buffer, buffer_size );
if ( temp == NULL )
{
fprintf( stderr, "allocation error!\n" );
exit( EXIT_FAILURE );
}
buffer = temp;
}
}
//make sure that `fread` did not fail end due to
//error (it should only end due to end-of-file)
if ( ferror(stdin) )
{
fprintf( stderr, "input error!\n" );
exit( EXIT_FAILURE );
}
//add terminating null character
buffer[input_size++] = '\0';
//shrink buffer to required size
{
void *temp;
temp = realloc( buffer, input_size );
if ( temp == NULL )
{
fprintf( stderr, "allocation error!\n" );
exit( EXIT_FAILURE );
}
buffer = temp;
}
//the entire contents is now stored in "buffer" as a
//string, and can be printed
printf( "contents of buffer:\n%s\n", buffer );
free( buffer );
}
上面的代码假设输入将因文件结束条件而终止,如果输入是从文件中通过管道传输的,则可能就是这种情况。
再想一想,不是像您在代码中所做的那样为整个文件使用一个大字符串,而是希望将char*
数组用于各个字符串,每个字符串都代表一行,例如lines[0]
将是第一行的字符串, lines[1]
将是第二行的字符串。 这样,您可以轻松地使用strstr
查找每一行上的“ ==== ” strchr
和strchr
以查找单个单词,并且仍然在内存中保留所有行以供进一步处理。
在这种情况下,我不建议您使用strtok
,因为该函数通过用空字符替换分隔符来修改字符串,因此它具有破坏性。 如果您需要进一步处理字符串,如您在评论部分所述,那么这可能不是您想要的。 这就是我建议您改用strchr
。
如果在编译时已知合理的最大行数,那么解决方案就很简单:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAX_LINE_LENGTH 1024
#define MAX_LINES 1024
int main( void )
{
char *lines[MAX_LINES];
int num_lines = 0;
char buffer[MAX_LINE_LENGTH];
//read one line per loop iteration
while ( fgets( buffer, sizeof buffer, stdin ) != NULL )
{
int line_length = strlen( buffer );
//verify that entire line was read in
if ( buffer[line_length-1] != '\n' )
{
//treat end-of file as equivalent to newline character
if ( !feof( stdin ) )
{
fprintf( stderr, "input line exceeds maximum line length!\n" );
exit( EXIT_FAILURE );
}
}
else
{
//remove newline character from string
buffer[--line_length] = '\0';
}
//allocate memory for new string and add to array
lines[num_lines] = malloc( line_length + 1 );
//verify that "malloc" succeeded
if ( lines[num_lines] == NULL )
{
fprintf( stderr, "allocation error!\n" );
exit( EXIT_FAILURE );
}
//copy line to newly allocated buffer
strcpy( lines[num_lines], buffer );
//increment counter
num_lines++;
}
//All input lines have now been successfully read in, so
//we can now do something with them.
//handle one line per loop iteration
for ( int i = 0; i < num_lines; i++ )
{
char *p, *q;
//attempt to find the " ==== " marker
p = strstr( lines[i], " ==== " );
if ( p == NULL )
{
printf( "Warning: skipping line because unable to find \" ==== \".\n" );
continue;
}
//skip the " ==== " marker
p += 6;
//split tokens on remainder of line using "strchr"
while ( ( q = strchr( p, ' ') ) != NULL )
{
printf( "found token: %.*s\n", (int)(q-p), p );
p = q + 1;
}
//output last token
printf( "found token: %s\n", p );
}
//cleanup allocated memory
for ( int i = 0; i < num_lines; i++ )
{
free( lines[i] );
}
}
当使用以下输入运行上面的程序时
first line before deliminator ==== first line after deliminator
second line before deliminator ==== second line after deliminator
它有以下输出:
found token: first
found token: line
found token: after
found token: deliminator
found token: second
found token: line
found token: after
found token: deliminator
但是,如果在编译时没有已知的合理最大行数,则数组lines
也必须设计为以与前一个程序中的buffer
类似的方式增长。 这同样适用于最大线路长度。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.