简体   繁体   中英

Counting the number of elements in a vector

I am looking to find the number of words that start with d, D, or any other character within a file. Currently I am having trouble counting each instance of a new word. For example, if there are 5 Davids and 3 Dogs within the file, I would want to count each of them individually.

I would prefer something that would not require massive change. Any help is appreciated.

#include<iostream>
#include<fstream>                               //needed for file opening and closing/manipulation within files
#include<vector>                                //needed for vectors to store the words from the file
#include<algorithm>                             //needed for sort algorithm later

using namespace std;

int main(){
    string inputName, num, words;

    cout<<"Enter a valid filename: ";           //Prompting user for a file name in the directory of this program exe
    cin>>inputName;

    ifstream file(inputName);                   //Creating a ifstream File which will open the file to the program

    vector<string> dWords;                      //Creating 2 vectors, 1 for anything that starts with 'd'/'D' and 2 for anything else
    vector<string> otherWords;

    while(!file.eof()){                         //While loop that runs until the file is eof or end of file.
        getline(file, words);
        while(file>>words){                     //Reading each line and extracting into the words variable
            if(words[0]=='d'||words[0]=='D'){   //if statement that checks if the first letter in each word starts with a 'd' or 'D'
                dWords.push_back(words);        //if true then the word gets added to the vector with the push_back

            }
            else if(words[0]=='"'){             //Checking for a niche case of when a word starts with a "
                if(words[1]=='d'||words[0]=='D'){//If true then the same if statement will happen to check for 'd' or 'D'
                    dWords.push_back(words);
                }
            }
            else{                               //This case is for everything not mentioned already
                otherWords.push_back(words);    //This is added to a different vector than the dWords
            }
        }
    }

    dWords.erase(unique(dWords.begin(), dWords.end()));
    otherWords.erase(unique(otherWords.begin(), otherWords.end()));

    sort(dWords.begin(), dWords.end());         //Using the C++ native sorting method that works with vectors to sort alphabetically
    sort(otherWords.begin(), otherWords.end());

    cout<<"All words starting with D or d in the file: "<<endl;     //printing out the words that start with 'd' or 'D' alphabetically
    for(int a=0; a<=dWords.size(); a++){
        cout<<dWords[a]<<endl;
    }

    cout<<endl;
    cout<<"All words not starting with D or d in the file: "<<endl; //printing out every other word/character left
    for(int b=0; b<=otherWords.size(); b++){
        cout<<otherWords[b]<<endl;
    }

    file.close();           //closing file after everything is done in program
}

Here is a version that illustrates what I mentioned in the main comments. This code doesn't need an extra vector to store the words that start with D .

#include <iostream>
#include <vector>                                
#include <algorithm>                             
#include <iterator>
#include <cctype>
#include <fstream>

int main()
{
    std::string words;
    std::vector<std::string> dWords;  
    std::string inputName; 
    std::cin >> inputName;
    ifstream file(inputName);    

    while(file >> words)
    {
        // remove punctuation
        words.erase(std::remove_if(words.begin(), words.end(), [](char ch) 
                        { return ::ispunct(static_cast<int>(ch)); }), words.end());
        dWords.push_back(words);      
    }
    
    // partition D from non-D words
    auto iter = std::partition(dWords.begin(), dWords.end(), [](const std::string& s) 
                                         { return toupper(s[0]) == 'D'; });
                                         
    // output results                                         
    std::cout << "The number of words starting with D: " << std::distance(dWords.begin(), iter) << "\n";
    std::cout << "Here are the words:\n";
    std::copy(dWords.begin(), iter, std::ostream_iterator<std::string>(std::cout, " "));
    
    std::cout << "\n\nThe number of words not starting with D: " << std::distance(iter, dWords.end()) << "\n";
    std::cout << "Here are the words:\n";
    std::copy(iter, dWords.end(), std::ostream_iterator<std::string>(std::cout, " "));
}

This is essentially a program that is about 4 lines.

1) A read of the word, 
2) a filtering of the word to remove the punctuation, 
3) partitioning the vector, 
4) getting the count by using the partition.

Here are the changes:

while(file >> words)

The loop to read in each word is simplified. All that was necessary was to use the >> to read each word in a loop.


Remove the punctuation from each word using remove_if and the ispunct lambda. This removes commas, quotes, and other symbols from the word. When this is done, there is no need to check for " later on in your test for double quotes.

words.erase(std::remove_if(words.begin(), words.end(), [](char ch)
    { return ::ispunct(static_cast<int>(ch)); }), words.end());

dWords.push_back(words);

We push all the words onto the vector. It doesn't matter if the word starts with D or not. We will take care of that later.


Separate the words that start with D from the words that do not start with D .

This is done by using the std::partition algorithm function. This function places items that match a certain criteria on the left side of the partition, and the items that do not match on the right side of the partition. An iterator is returned, denoting where the partition point is.

In this case, the criteria is "all words that start with D or d -- if this is true for a character, it is placed on the left of the partition. Note the use of toupper to test both d and D .

// partition D from non-D words
auto iter = std::partition(dWords.begin(), dWords.end(), [](const std::string& s) 
                                     { return toupper(s[0]) == 'D'; });

Get the count of the number of items on the left and right partition.

Since all the items on the left of the partition start with D , then it's just a matter of getting the distance from the beginning of the vector up to the partition point iter to get a count of the items.

Likewise, to get a count of the words not starting with D , we count the characters from the partition point iter to the end of the vector:

To get the number of items we can use the std::distance algorithm function:

 // output results                                         
    std::cout << "The number of words starting with D: " << std::distance(dWords.begin(), iter) << "\n";
    std::cout << "Here are the words:\n";
    std::copy(dWords.begin(), iter, std::ostream_iterator<std::string>(std::cout, " "));

    std::cout << "\n\nThe number of words not starting with D: " << std::distance(iter, dWords.end()) << "\n";
    std::cout << "Here are the words:\n";
    std::copy(iter, dWords.end(), std::ostream_iterator<std::string>(std::cout, " "));

The std::copy is just a fancy way of outputting the contents of the vector without writing a loop, so don't let that distract you.

Here is a live example . The only difference is that cin is used instead of a file.


If you really wanted to separate the vector into two distinct vectors, one with D words and one without, then it is as simple as creating the vectors from the partitioned vector:

std::vector<std::string> onlyDwords(dWords.begin(), iter);
std::vector<std::string> nonDWords(iter, dWords.end());

Avoiding std::vector altogether and using std::map provides a succinct way to maps strings beginning with any character to the frequency that words beginning with that character occur in a given block of text.

std::map<std::string, size_t> provides a way to map unique strings to the number of times they occurs. The std::string is used as the unique key and the size_t count is used as the value . Since the strings in the map will be unique, you only need to read each word, check if the word begins with the character to find, and then:

    mymap[word]++;

After you are done reading words, mymap will hold the frequency that words added to the map occur. Reading from a file, using the map name wordfreq , you could do:

#include <iostream>
#include <iomanip>
#include <fstream>
#include <string>
#include <cctype>
#include <map>

int main (int argc, char **argv) {
    
    /* filename as 1st argument or use "default.txt" by default */
    const char *fname = argc > 1 ? argv[1] : "default.txt";     /* filename */
    const char c2find = argc > 2 ? tolower(*argv[2]) : 'd';     /* 1st char to find */
    std::map<std::string, size_t> wordfreq{};
    
    std::string word;                                   /* string to hold each word */
    std::ifstream f (fname);                            /* open ifstream using fname */
    
    if (!f.is_open()) { /* validate file open for reading */
        std::cerr << "error: file open failed '" << fname << "'.\n"
                  << "usage: " << argv[0] << " [filename (default.txt)]\n";
        return 1;
    }
    
    while (f >> word) {                         /* read each whitespace separate word */
        if (tolower(word[0]) == c2find) {       /* if word begins with char to find */
            wordfreq[word]++;                   /* increment frequency of word in map */
        }
    }
    
    for (const auto& w : wordfreq)
        std::cout << std::left << std::setw(16) << w.first <<
                    std::right << w.second << '\n';
}

Example Input File

$ cat default.txt
All work and no play makes David a dull boy!
All work and no play makes David a dull boy!
All work and no play makes David a dull boy!
All work and no play makes David a dull boy!
All work and no play makes David a dull boy!

Example Use/Output

$ ./bin/map_word_freq
David           5
dull            5

or for 'a' :

$./bin/map_word_freq default.txt a
All             5
a               5
and             5

( note: if you want to provide a different character (which is the 2nd argument to the program), you have to provide the filename to read before it)

Look things over and let me know if you have further questions.

In your code you use std::unique to reduce adjacent duplicate words to 1 inside your vectors. In the body of the question, you state that you'd prefer to count each and every word, so in my version of the code below, I also left copies of the original vectors, and a count summary at the end.

I have also corrected words[1]=='d'||words[0]=='D' to two 1 , as pointed out in the comments section, and tweaked other aspects of the original code ( std::vector::erase needs a second iterator as argument):

#include<iostream>
#include<fstream>                               //needed for file opening and closing/manipulation within files
#include<vector>                                //needed for vectors to store the words from the file
#include<algorithm>                             //needed for sort algorithm later

using namespace std;

int main(){
    string inputName, num, words;

    cout<<"Enter a valid filename: ";           //Prompting user for a file name in the directory of this program exe
    cin>>inputName;

    ifstream file(inputName);                   //Creating a ifstream File which will open the file to the program

    vector<string> dWords;                      //Creating 2 vectors, 1 for anything that starts with 'd'/'D' and 2 for anything else
    vector<string> otherWords;

    while(!file.eof()){                         //While loop that runs until the file is eof or end of file.
        getline(file, words);
        while(file>>words){                     //Reading each line and extracting into the words variable
            if(words[0]=='d'||words[0]=='D'){   //if statement that checks if the first letter in each word starts with a 'd' or 'D'
                dWords.push_back(words);        //if true then the word gets added to the vector with the push_back
            }
            else if(words[0]=='"'){             //Checking for a niche case of when a word starts with a "
                if(words[1]=='d'||words[1]=='D'){//If true then the same if statement will happen to check for 'd' or 'D' --- corrected second condition, from words[0]=='D'
                    dWords.push_back(words);
                }
            }
            else{                               //This case is for everything not mentioned already
                otherWords.push_back(words);    //This is added to a different vector than the dWords
            }
        }
    }

    // I have added 2 copies of the vectors of strings, in case you intend to count each single word, without reducing adjacent duplicates to 1 with std::unique
    vector<string> original_dWords(dWords);
    vector<string> original_otherWords(otherWords);

    dWords.erase(unique(dWords.begin(), dWords.end()), dWords.end());
    otherWords.erase(unique(otherWords.begin(), otherWords.end()), otherWords.end());

    sort(dWords.begin(), dWords.end());         //Using the C++ native sorting method that works with vectors to sort alphabetically
    sort(otherWords.begin(), otherWords.end());

    cout<<"All words starting with D or d in the file: "<<endl;     //printing out the words that start with 'd' or 'D' alphabetically
    for(unsigned a=0; a<dWords.size(); a++){
        cout<<dWords[a]<<endl;
    }

    cout<<endl;
    cout<<"All words not starting with D or d in the file: "<<endl; //printing out every other word/character left
    for(unsigned b=0; b<otherWords.size(); b++){
        cout<<otherWords[b]<<endl;
    }

    // added a words count summary
    cout << "Number of words beginning with d,D is: " << original_dWords.size() << endl;
    cout << "If we leave just one out of consecutive, identical words, that number falls to: " << dWords.size() << endl;
    cout << "Number of words not beginning with d,D is: " << original_otherWords.size() << endl;
    cout << "If we leave just one out of consecutive, identical words, that number falls to: " << otherWords.size() << endl;

    file.close();           //closing file after everything is done in program
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM