简体   繁体   中英

Count number of occurrences of elements with vectors

I'm currently learning C++ (properly) by going through the book Accelerated C++ by Andrew Koenig and Barbara Moo on my own, and doing all the exercises in each chapter.

Exercise 3-3: Write a program to count how many times each distinct word appears in its input . To me this exercise seemed extremely difficult, especially considering: 1. The examples and other exercises in that chapter were relatively simple and 2. You are only allowed to use vectors, so nothing advanced. (or maybe it's just me misjudging the difficulty)

I searched the web for hints and saw others having trouble with this exercise, but the solutions offered by people seemed unclear to me. Most people suggested to use organizing methods that are introduced later in the book, which kind of defeats the point of the exercise. Finally, I pieced together hints and bits of methods I found on different forums (including here) to come up with my own solution:

#include <algorithm>
#include <iomanip>
#include <ios>
#include <iostream>
#include <string>
#include <vector>

using std::cin;
using std::setprecision;
using std::cout;
using std::string;
using std::endl;
using std::streamsize;
using std::sort;
using std::vector;

int main()
{

// Ask for string input

cout << "Please write some text, followed by end-of-file: " << endl;

vector<string> word_input;
string word;

// input words into string vector word_input

    typedef vector<string>::size_type vecsize;


    while (cin >> word) 
    {
        word_input.push_back(word);                 
    }

// sort the vector in alphabetical order to be able to separate distinct words

    sort(word_input.begin(),word_input.end());

// create two vectors: one where each (string) element is a unique word, and one
// that stores the index at which a new distinc word appears

    vector<string> unique_words;
    vector<int> break_index;


    for (int i=0; i != word_input.size()-1; ++i)
    {
        if(word_input[i+1] != word_input[i])
            {
                unique_words.push_back(word_input[i]);
                break_index.push_back(i);
            }

    }

// add the last word in the series to the unique word string vector

    unique_words.push_back(word_input[word_input.size()-1]);

// create a vector that counts how many times each unique word occurs, preallocate
// with 1's with as many times a new word occurs in the series (plus 1 to count the first word)

    vector<int> word_count(1,break_index[0]+1);

// if a new word occurs, count how many times the previous word occured by subtracting the number of words so far

    for(int i=0; i != break_index.size()-1;++i)
        {
            word_count.push_back(break_index[i+1] - break_index[i]);
        }

// add the number of times the last word in the series occurs: total size of text - 1 (index starts at 0) - index at which the last word starts

    word_count.push_back(word_input.size()-1-break_index[break_index.size()-1]);


    // number of (distinct) words and their frequency output

    cout << "The number of words in this text is: " << word_input.size() << endl;

    cout << "Number of distinct words is: " << unique_words.size() << endl;

        // The frequency of each word in the text

        for(int i=0; i != unique_words.size(); ++i)
            cout << unique_words[i] << " occurs " << word_count[i] << " time(s)" << endl;



return 0;
}

Is there a better way of doing this using vectors? Can the code be made more efficient by combining any loops?

The solution that worked for me (when I was working through this problem) was to use three vectors: an input_vector , an output_vector , and a count_vector . Read the user input with a while using std::cin until an escape character is entered: use input_vector.push_back(input_word) to populate the input_vector with words. Use std::sort from <algorithm> to sort the vector, and create output_vector (with one value, the first word in input_vector ) and count_vector (with one value, 1 ).

Then, for each element in input_vector (starting from the second, not the first), check whether the current element is the same as the last element. If it is, add 1 to the current element in count_vector . Otherwise, add the current word in input_vector to output_vector using push_back() , and increase the size of count_vector by one element (whose value is 1 ).

If you imagine that someone is using your code to process the entire works of Shakespeare, you would be wasting a lot of space by storing EVERY word. If you instead hold a structure of "the word" and "count of the word", you would only have to store the word "the" once, even if it occurrs 100000 times in the text your program is being fed. That is if you even need to know that the word has occurred more than once - if all you need is a list of unique words, all you need is to see if you have already stored the word. [Storing them in sorted order would make it possible to use binary_search to find them, which would help the runtime if you do indeed run the 800K (not unique) words of Shakespeare through your code]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM