简体   繁体   中英

Reading a text file and returning the word count by line in C++

We are beginning to move from C to C++ in my programming class, and our current lab assignment is to create a program which given a text file reads its contents and then returns a list of the words in the file along with the line number they appear on and the number of times that word appears on each line, formatted Word Line:Count.

Foo bar bar
Baz
Foo
<EOF>

Which should return:

Foo  1:1 3:1
Bar  1:2
Baz  2:1

The only data structures that we have covered so far are maps, with which we wrote the following program which outputs the total wordcount

int main(int argc, const char*argv[]) {
    map<string, unsigned int> table;
    string word;

    while (cin >> word) {
        ++table[word];
    }

    for (std::map<string, unsigned int>::iterator itr = table.begin();
            itr != table.end(); ++itr) {
        cout << itr->first << "\t" << itr->second << endl;
    }

    return 0;
}

We were told that it would be possible to modify this program (slightly) minimally in order to have it print out the line number and the word count. My question is, is there a way to use a map to have 2 values for each key? Or is there a better way to implement something like this?

You can have your map store most anything as a key's value. To have the capability of counting the number of times a word appears and keeping a dynamic list of the line numbers that it appears on you can do the following. This is the simplest straightforward solution that came to me, it is not the most efficient.

Use a map with a string key and value vector to store, index = WordLine, value at index = Count

#include <vector>       // std::vector

using namespace std;
map<string, vector<int>> words;

As you come across words, look them up in the map and increase the vector at the line_num index to denote the amount of times it appears on the line.

#include <sstream>
using namespace std;

string line;
string word;
int line_num = 0;
while (getline(cin, line)) {
    istringstream words_iss(line); 
    while(line >> word) {
        ++words.at(word)[line_num];
    }
    ++line_num;
}

The inefficiency comes from using the index to represent the line number as the word might not show up until line n. However when it puts it in the vector at index n its going to allocate space for 0 - (n-1) ints for the vector. Also in printing you'll have to check every value in the the vector to see if it is not 0.

You can print by looping through each string in the map, then looping through each key's vector and only printing when the value at the index is not 0.

As mentioned in the comments, another solution would be to use a

map<string, map<int, int>> 

with similar logic. Which would be more efficient for most cases.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM