简体   繁体   中英

Read text file and output number of words, distinct words, and most frequent word used

I have to read from a text file all the words and output the total number of words, number of distinct words, and the most frequently used word. I'm still a beginner so any help is awesome.

When reading the words, hyphens/apostrophes/punctuations are omitted, so O'connor would be the same word as Oconnor. <---I don't know how to go about doing that, so any help would be great.

This is what I have so far, but for now when I try to compile it gives me a warning with the strcpy and says I'm not using it properly. The output for the total number of words works, but it gives me 0 for the number of distinct words, and nothing for most frequently used word.

Any help would be awesome, thanks!

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
    int number=0;
    int i;
    char *temp;
    char *temp2;
    char word[2500][50];
    int wordCount=0;
    int mostFreq=1;
    char mostFreqWord[2500][50];
    int frequentCount=0;
    int distinctCount=0;
    int j;
    char *p;
    FILE *fp;
    //reads file!
    fp= fopen("COEN12_LAB1.txt", "r");
    if(fp == NULL)                  // checks to see if file is empty
    {
            printf("File Missing!\n");
            return 0;
    }
    while(fscanf(fp,"%s", word) == 1)  //scans every word in the text file
            wordCount++;  //counts number of words
    while(fscanf(fp,"%s",word) == 1)
    {
            for(i=0;i<wordCount;i++)
            {
                    temp=word[i];
                    for(j=0;j<wordCount;j++)
                    {
                            temp2 = word[j];
                            if(strcmp(temp,temp2) == 0)  //check to see if word is repeated
                            {
                                    frequentCount++;
                                    if(frequentCount>mostFreq)
                                    {
                                            strcpy(mostFreqWord,word[i]);  //this doesn't work
                                    }
                            }
                            distinctCount++;
                    }
            } 
    }
    printf("Total number of words: %d\n", wordCount);
    printf("Total number of distinct words: %d\n", distinctCount);
    printf("The most frequently appeared word is: %s \n", &mostFreqWord);
    fclose(fp);
}

The problem with strcpy() is, as diagnosed by Beginner in their answer that if you are copying to mostFreqWord , you need to subscript it because it is a 2D array.

However, you have a more fundamental problem. Your word counting loop reads until EOF, and you don't rewind the file to start over. Further, rereading the file like that is not a particularly good algorithm (and wouldn't work at all if you were reading data piped in from another program).

You should combine the two loops. Count the words as they arrive, but also clean up the word (removing non-alphabetic characters — or is that non-alphanumeric characters, and does _ underscore count or not?), and then insert it into the word list if it does not already appear or increase the frequency count for the word if it does already appear.

When the input phase is done, you should have a count of the number of distinct words ready, and you'll be able to find the most frequent by scanning the list of frequencies to find the maximum (and the index number where the maximum appeared), and then reporting appropriately.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM