简体   繁体   中英

Read input text into an array of words and get rid of punctuation marks

There's this giant code I found on the internet.. It's a program that finds n most frequent words in a file and prints them out. The following program reads the given text file, but I want to write the input text by myself, so I am probably going to store the words in an array. How do I do that so that the program would read the text of random length and the following program would still work? And also if there were punctuation marks in the input text, I would have to get rid of them, so the text wouldn't consist of only the letters from 'a' to 'z'. Do I even need the MAX_CHARS constant then?

#include <stdio.h>
#include <string.h>
#include <ctype.h>

# define MAX_CHARS 26
# define MAX_WORD_SIZE 32000


// A utility function to show results, The min heap
// contains n most frequent words so far, at any time
void displayMinHeap( MinHeap* minHeap )
{
    int i;

    // print top N word with frequency
    for( i = 0; i < minHeap->count; ++i )
    {
        printf( "%s %d\n", minHeap->array[i].word,
                            minHeap->array[i].frequency );
    }
}

// The main funtion that takes a file as input, add words to heap
// and Trie, finally shows result from heap
void printKMostFreq( FILE* fp, int n )
{
    // Create a Min Heap of Size n
    MinHeap* minHeap = createMinHeap( n );

    // Create an empty Trie
    TrieNode* root = NULL;

    // A buffer to store one word at a time
    char buffer[MAX_WORD_SIZE];

    // Read words one by one from file.  Insert the word in Trie and Min Heap
    while( fscanf( fp, "%s", buffer ) != EOF )
        insertTrieAndHeap(buffer, &root, minHeap);

    // The Min Heap will have the n most frequent words, so print Min Heap nodes
    displayMinHeap( minHeap );
}

int main()
{
    int n;
    scanf("%d", &n);
    FILE *fp = fopen ("file.txt", "r");
    if (fp == NULL)
        printf ("File doesn't exist ");
    else
        printKMostFreq (fp, n);
    return 0;
}

It is possible to modify this program to do what you want, but I'm not going to do that for you, at least not without being paid. For a simple solution, try to generate your text and write it to a text file. You can then pass the content of that file into the word counting software. You can also use pipes to do that.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM