I'm currently learning C++ (properly) by going through the book Accelerated C++ by Andrew Koenig and Barbara Moo on my own, and doing all the exercises in each chapter.
Exercise 3-3: Write a program to count how many times each distinct word appears in its input . To me this exercise seemed extremely difficult, especially considering: 1. The examples and other exercises in that chapter were relatively simple and 2. You are only allowed to use vectors, so nothing advanced. (or maybe it's just me misjudging the difficulty)
I searched the web for hints and saw others having trouble with this exercise, but the solutions offered by people seemed unclear to me. Most people suggested to use organizing methods that are introduced later in the book, which kind of defeats the point of the exercise. Finally, I pieced together hints and bits of methods I found on different forums (including here) to come up with my own solution:
#include <algorithm>
#include <iomanip>
#include <ios>
#include <iostream>
#include <string>
#include <vector>
using std::cin;
using std::setprecision;
using std::cout;
using std::string;
using std::endl;
using std::streamsize;
using std::sort;
using std::vector;
int main()
{
// Ask for string input
cout << "Please write some text, followed by end-of-file: " << endl;
vector<string> word_input;
string word;
// input words into string vector word_input
typedef vector<string>::size_type vecsize;
while (cin >> word)
{
word_input.push_back(word);
}
// sort the vector in alphabetical order to be able to separate distinct words
sort(word_input.begin(),word_input.end());
// create two vectors: one where each (string) element is a unique word, and one
// that stores the index at which a new distinc word appears
vector<string> unique_words;
vector<int> break_index;
for (int i=0; i != word_input.size()-1; ++i)
{
if(word_input[i+1] != word_input[i])
{
unique_words.push_back(word_input[i]);
break_index.push_back(i);
}
}
// add the last word in the series to the unique word string vector
unique_words.push_back(word_input[word_input.size()-1]);
// create a vector that counts how many times each unique word occurs, preallocate
// with 1's with as many times a new word occurs in the series (plus 1 to count the first word)
vector<int> word_count(1,break_index[0]+1);
// if a new word occurs, count how many times the previous word occured by subtracting the number of words so far
for(int i=0; i != break_index.size()-1;++i)
{
word_count.push_back(break_index[i+1] - break_index[i]);
}
// add the number of times the last word in the series occurs: total size of text - 1 (index starts at 0) - index at which the last word starts
word_count.push_back(word_input.size()-1-break_index[break_index.size()-1]);
// number of (distinct) words and their frequency output
cout << "The number of words in this text is: " << word_input.size() << endl;
cout << "Number of distinct words is: " << unique_words.size() << endl;
// The frequency of each word in the text
for(int i=0; i != unique_words.size(); ++i)
cout << unique_words[i] << " occurs " << word_count[i] << " time(s)" << endl;
return 0;
}
Is there a better way of doing this using vectors? Can the code be made more efficient by combining any loops?
The solution that worked for me (when I was working through this problem) was to use three vectors: an input_vector
, an output_vector
, and a count_vector
. Read the user input with a while
using std::cin
until an escape character is entered: use input_vector.push_back(input_word)
to populate the input_vector
with words. Use std::sort
from <algorithm>
to sort the vector, and create output_vector
(with one value, the first word in input_vector
) and count_vector
(with one value, 1
).
Then, for each element in input_vector
(starting from the second, not the first), check whether the current element is the same as the last element. If it is, add 1
to the current element in count_vector
. Otherwise, add the current word in input_vector
to output_vector
using push_back()
, and increase the size of count_vector
by one element (whose value is 1
).
If you imagine that someone is using your code to process the entire works of Shakespeare, you would be wasting a lot of space by storing EVERY word. If you instead hold a structure of "the word" and "count of the word", you would only have to store the word "the" once, even if it occurrs 100000 times in the text your program is being fed. That is if you even need to know that the word has occurred more than once - if all you need is a list of unique words, all you need is to see if you have already stored the word. [Storing them in sorted order would make it possible to use binary_search
to find them, which would help the runtime if you do indeed run the 800K (not unique) words of Shakespeare through your code]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.