简体   繁体   English

C程序对输入文件中的单词总数进行计数

[英]C program to count total words in an input file

Input file contains a completely empty line at line 2 and an unnecessary white space after the final full stop of the text. 输入文件在第2行包含一个完全为空的行,并在文本的最后一个句号之后包含一个不必要的空格。 With this input file I am getting 48 words while I was suppose to get 46 words. 通过这个输入文件,我得到了48个单词,而我想得到了46个单词。

My input file contains: 我的输入文件包含:
"Opening from A Tale of Two Cities by Charles Darwin “从查尔斯·达尔文的两个城市的故事开始

It was the best of times, it was the worst of times. 那是最美好的时光,那是最糟糕的时光。 It was the age of wisdom, it was the age of foolishness. 那是智慧的时代,那是愚昧的时代。 It was the epoch of belief, it was the epoch of incredulity. 这是信仰的时代,是怀疑的时代。 "

Here's how I tried: 这是我尝试的方法:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

#define max_story_words 1000
#define max_word_length 80

int main (int argc, char **argv)
{


    char story[max_story_words][max_word_length] = {{0}};
    char line[max_story_words] = {0};
    char *p;
    char ch = 0;
    char *punct="\n ,!.:;?-";
    int num_words = 1;
    int i = 0;

    FILE *file_story = fopen ("TwoCitiesStory.txt", "r");
    if (file_story==NULL) {
        printf("Unable to open story file '%s'\n","TwoCitiesStory.txt");
        return (EXIT_FAILURE);
    }

    /* count words */
    while ((ch = fgetc (file_story)) != EOF) {
        if (ch == ' ' || ch == '\n')
            num_words++;
    }

    rewind (file_story);

    i = 0;
    /* read each line in file */
    while (fgets (line, max_word_length, file_story) != NULL)
    {
        /* tokenize line into words removing punctuation chars in punct */
        for (p = strtok (line, punct); p != NULL; p = strtok (NULL, punct))
        {
            /* convert each char in p to lower-case with tolower */
            char *c = p;
            for (; *c; c++)
                *c = tolower (*c);

            /* copy token (word) to story[i] */
            strncpy ((char *)story[i], p, strlen (p));
            i++;
        }
    }

    /* output array */
    for(i = 0; i < num_words; i++)
        printf ("story[%d]: %s\n", i, story[i]);

    printf("\ntotal words: %d\n\n",num_words);

    return (EXIT_SUCCESS);
}

Your num_words takes account of the two extra whitespaces, that's why you get 48. 您的num_words考虑了两个额外的空格,这就是为什么得到48个空格的原因。

You should simply print i immediately after the fgets - strtok loop, if I'm not mistaken. 如果我没记错的话,您应该在fgets - strtok循环之后立即打印i

Something along these lines: 遵循以下原则:

while ((ch = fgetc (file_story)) != EOF) {
    if (ch == ' ') {
         num_words++;
         while( (ch = fgetc (file_story)) == ' ' && (ch != EOF) )
    }
    if (ch == '\n') {
         num_words++;
         while( (ch = fgetc (file_story)) == '\n' && (ch != EOF) )
    }

Though I wonder why you are only taking whitespace and newline characters for counting new words. 虽然我不知道为什么您只使用空格和换行符来计算新单词。 Two words separated by some other punctuation mark are definitely not accouted for in your code 您的代码中绝对不会包含由其他标点符号分隔的两个单词

My suggestion is to change the words counting loop as follows: 我的建议是更改单词计数循环,如下所示:

/* count words */
num_words = 0;
int flag = 0; // set 1 when word starts and 0 when word ends
while ((ch = fgetc (file_story)) != EOF) {
    if ( isalpha(ch) )
    {
        if( 0 == flag )   // if it is a first letter of word ...
        {
            num_words++;  // ... add to word count
            flag = 1;   // and set flag to skip not first letters
        }
        continue;
    }
    if ( isspace(ch) || ispunct(ch) )  // if word separator ...
    {
        flag = 0;                      // ... reset flag
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM