简体   繁体   中英

Word Frequency of a string (i.e. File I/O)?

I wrote a C++ program that reads a text file. I want the program to count the number of times a word appears, however. For example, the output should look as follows:

Word Frequency Analysis

Word          Frequency
I                1
don't            1
know             1
the              2
key              1
to               3
success          1
but              1
key              1
failure          1
is               1
trying           1
please           1
everybody        1

Notice how each word appears only once. What do I need to do in order to achieve this effect??

Here is the text file (ie named BillCosby.txt):

I don't know the key to success, but the key to failure is trying to please everybody.

Here is my code so far. I am having an extreme mental block and cannot figure out a way to get the program to read the number of times a word occurs.

#include <iostream>
#include <fstream>
#include <iomanip>

const int BUFFER_LENGTH = 256;
const int NUMBER_OF_STRINGS = 100;

int numberOfElements = 0;
char buffer[NUMBER_OF_STRINGS][BUFFER_LENGTH];
char * words = buffer[0];
int frequency[NUMBER_OF_STRINGS];

int StringLength(char * buffer);
int StringCompare(char * firstString, char * secondString);

int main(){

int isFound = 1;
int count = 1;

std::ifstream input("BillCosby.txt");

if(input.is_open())
{
    //Priming read
    input >> buffer[numberOfElements];
    frequency[numberOfElements] = 1;

while(!input.eof())
    {
    numberOfElements++;
    input >> buffer[numberOfElements];

    for(int i = 0; i < numberOfElements; i++){
        isFound = StringCompare(buffer[numberOfElements], buffer[i]);
            if(isFound == 0)
                ++count;
    }

    frequency[numberOfElements] = count;


    //frequency[numberOfElements] = 1;

    count = 1;
    isFound = 1;
    }
    numberOfElements++;
}
else
    std::cout << "File is not open. " << std::endl;

std::cout << "\n\nWord Frequency Analysis " << std::endl;
std::cout << "\n" << std::endl;

std::cout << "Word " << std::setw(25) << "Frequency\n" << std::endl;

for(int i = 0; i < numberOfElements; i++){
    int length = StringLength(buffer[i]);
    std::cout << buffer[i] << std::setw(25 - length) << frequency[i] << 

 std::endl;
}



return 0;
}

int StringLength(char * buffer){
char * characterPointer = buffer;

while(*characterPointer != '\0'){
    characterPointer++;
}

return characterPointer - buffer;
}

int StringCompare(char * firstString, char * secondString)
   {
    while ((*firstString == *secondString || (*firstString == *secondString - 32) ||    

(*firstString - 32 == *secondString)) && (*firstString != '\0'))
{
    firstString++;
    secondString++;
}

if (*firstString > *secondString)
    return 1;

else if (*firstString < *secondString)
    return -1;

return 0;
}

Your program is quite confusing to read. But this part stuck out to me:

frequency[numberOfElements] = 1;

(in the while loop). You realize that you are always setting the frequency to 1 no matter how many times the word appears right? Maybe you meant to increment the value and not set it to 1?

One approach is to tokenize (split the lines into words), and then use c++ map container. The map would have the word as a key, and word count for value.

For each token, add it into the map, and increment the wordcount. A map key is unique, hence you wouldn't have duplicates.

You can use stringstream for your tokenizer, and you can find the map container reference (incl examples) here .

And don't worry, a good programmer deals with mental blocks on a daily basis -- so get used to it :)

Flow of solution should be something like this: - initialize storage (you know you have a pretty small file apparently?) - set initial count to zero (not one) - read words into array. When you get a new word, see if you already have it; if so, add one to the count at that location; if not, add it to the list of words ("hey - a new word!") and set its count to 1 - loop over all words in the file

Be careful with white space - make sure you are matching only non white space characters. Right now you have "key" twice. I suspect that is a mistake?

Good luck.

Here's a code example that I tested with codepad.org:

#include <iostream>
#include <map>
#include <string>
#include <sstream>

using namespace std;

int main()
{
string s = "I don't know the key to success, but the key to failure is trying to please everybody.";
string word;
map<string,int> freq;

for ( std::string::iterator it=s.begin(); it!=s.end(); ++it)
{
    if(*it == ' ')
    {
         if(freq.find(word) == freq.end()) //First time the word is seen
         {
             freq[word] = 1;
         }
         else //The word has been seen before
         {
             freq[word]++;
         }
         word = "";
    }
    else
    {
         word.push_back(*it);
    }
}

for (std::map<string,int>::iterator it=freq.begin(); it!=freq.end(); ++it)
    std::cout << it->first << " => " << it->second << '\n';

}

It stops when it finds a space so grammatical symbols will mess things up but you get the point.

Output:

I => 1
but => 1
don't => 1
failure => 1
is => 1
key => 2
know => 1
please => 1
success, => 1 //Note this isn't perfect because of the comma. A quick change can fix this though, I'll let //you figure that out on your own.
the => 2
to => 3
trying => 1

I'm a bit hesitant to post a direct answer to something that looks a lot like homework, but I'm pretty sure if somebody turns this in as homework, any halfway decent teacher/professor is going to demand some pretty serious explanation, so if you do so, you'd better study it carefully and be ready for some serious questions about what all the parts are and how they work.

#include <map>
#include <iostream>
#include <iterator>
#include <algorithm>
#include <string> 
#include <fstream>
#include <iomanip>
#include <locale>
#include <vector>

struct alpha_only: std::ctype<char> {
    alpha_only() : std::ctype<char>(get_table()) {}

    static std::ctype_base::mask const* get_table() {
        static std::vector<std::ctype_base::mask> 
            rc(std::ctype<char>::table_size,std::ctype_base::space);
        for (int i=0; i<std::ctype<char>::table_size; i++)
            if (isalpha(i)) rc[i] = std::ctype_base::alpha;
        return &rc[0];
    }
};

typedef std::pair<std::string, unsigned> count;

namespace std { 
    std::ostream &operator<<(std::ostream &os, ::count const &c) { 
        return os << std::left << std::setw(25) << c.first 
                  << std::setw(10) << c.second;
    }
}

int main() { 
    std::ifstream input("billcosby.txt");
    input.imbue(std::locale(std::locale(), new alpha_only()));

    std::map<std::string, unsigned> words;

    std::for_each(std::istream_iterator<std::string>(input),
                    std::istream_iterator<std::string>(),
                    [&words](std::string const &w) { ++words[w]; });
    std::copy(words.begin(), words.end(),
              std::ostream_iterator<count>(std::cout, "\n"));
    return 0;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM