简体   繁体   中英

C program to count total words in an input file

Input file contains a completely empty line at line 2 and an unnecessary white space after the final full stop of the text. With this input file I am getting 48 words while I was suppose to get 46 words.

My input file contains:
"Opening from A Tale of Two Cities by Charles Darwin

It was the best of times, it was the worst of times. It was the age of wisdom, it was the age of foolishness. It was the epoch of belief, it was the epoch of incredulity. "

Here's how I tried:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

#define max_story_words 1000
#define max_word_length 80

int main (int argc, char **argv)
{


    char story[max_story_words][max_word_length] = {{0}};
    char line[max_story_words] = {0};
    char *p;
    char ch = 0;
    char *punct="\n ,!.:;?-";
    int num_words = 1;
    int i = 0;

    FILE *file_story = fopen ("TwoCitiesStory.txt", "r");
    if (file_story==NULL) {
        printf("Unable to open story file '%s'\n","TwoCitiesStory.txt");
        return (EXIT_FAILURE);
    }

    /* count words */
    while ((ch = fgetc (file_story)) != EOF) {
        if (ch == ' ' || ch == '\n')
            num_words++;
    }

    rewind (file_story);

    i = 0;
    /* read each line in file */
    while (fgets (line, max_word_length, file_story) != NULL)
    {
        /* tokenize line into words removing punctuation chars in punct */
        for (p = strtok (line, punct); p != NULL; p = strtok (NULL, punct))
        {
            /* convert each char in p to lower-case with tolower */
            char *c = p;
            for (; *c; c++)
                *c = tolower (*c);

            /* copy token (word) to story[i] */
            strncpy ((char *)story[i], p, strlen (p));
            i++;
        }
    }

    /* output array */
    for(i = 0; i < num_words; i++)
        printf ("story[%d]: %s\n", i, story[i]);

    printf("\ntotal words: %d\n\n",num_words);

    return (EXIT_SUCCESS);
}

Your num_words takes account of the two extra whitespaces, that's why you get 48.

You should simply print i immediately after the fgets - strtok loop, if I'm not mistaken.

Something along these lines:

while ((ch = fgetc (file_story)) != EOF) {
    if (ch == ' ') {
         num_words++;
         while( (ch = fgetc (file_story)) == ' ' && (ch != EOF) )
    }
    if (ch == '\n') {
         num_words++;
         while( (ch = fgetc (file_story)) == '\n' && (ch != EOF) )
    }

Though I wonder why you are only taking whitespace and newline characters for counting new words. Two words separated by some other punctuation mark are definitely not accouted for in your code

My suggestion is to change the words counting loop as follows:

/* count words */
num_words = 0;
int flag = 0; // set 1 when word starts and 0 when word ends
while ((ch = fgetc (file_story)) != EOF) {
    if ( isalpha(ch) )
    {
        if( 0 == flag )   // if it is a first letter of word ...
        {
            num_words++;  // ... add to word count
            flag = 1;   // and set flag to skip not first letters
        }
        continue;
    }
    if ( isspace(ch) || ispunct(ch) )  // if word separator ...
    {
        flag = 0;                      // ... reset flag
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM