[英]Counting the number of elements in a vector
我正在尋找文件中以 d、D 或任何其他字符開頭的單詞的數量。 目前,我無法計算每個新單詞的實例。 例如,如果文件中有 5 條 Davids 和 3 條 Dogs,我想單獨計算它們中的每一個。
我更喜歡不需要大量改變的東西。 任何幫助表示贊賞。
#include<iostream>
#include<fstream> //needed for file opening and closing/manipulation within files
#include<vector> //needed for vectors to store the words from the file
#include<algorithm> //needed for sort algorithm later
using namespace std;
int main(){
string inputName, num, words;
cout<<"Enter a valid filename: "; //Prompting user for a file name in the directory of this program exe
cin>>inputName;
ifstream file(inputName); //Creating a ifstream File which will open the file to the program
vector<string> dWords; //Creating 2 vectors, 1 for anything that starts with 'd'/'D' and 2 for anything else
vector<string> otherWords;
while(!file.eof()){ //While loop that runs until the file is eof or end of file.
getline(file, words);
while(file>>words){ //Reading each line and extracting into the words variable
if(words[0]=='d'||words[0]=='D'){ //if statement that checks if the first letter in each word starts with a 'd' or 'D'
dWords.push_back(words); //if true then the word gets added to the vector with the push_back
}
else if(words[0]=='"'){ //Checking for a niche case of when a word starts with a "
if(words[1]=='d'||words[0]=='D'){//If true then the same if statement will happen to check for 'd' or 'D'
dWords.push_back(words);
}
}
else{ //This case is for everything not mentioned already
otherWords.push_back(words); //This is added to a different vector than the dWords
}
}
}
dWords.erase(unique(dWords.begin(), dWords.end()));
otherWords.erase(unique(otherWords.begin(), otherWords.end()));
sort(dWords.begin(), dWords.end()); //Using the C++ native sorting method that works with vectors to sort alphabetically
sort(otherWords.begin(), otherWords.end());
cout<<"All words starting with D or d in the file: "<<endl; //printing out the words that start with 'd' or 'D' alphabetically
for(int a=0; a<=dWords.size(); a++){
cout<<dWords[a]<<endl;
}
cout<<endl;
cout<<"All words not starting with D or d in the file: "<<endl; //printing out every other word/character left
for(int b=0; b<=otherWords.size(); b++){
cout<<otherWords[b]<<endl;
}
file.close(); //closing file after everything is done in program
}
這是一個版本,說明了我在主要評論中提到的內容。 此代碼不需要額外的向量來存儲以D
開頭的單詞。
#include <iostream>
#include <vector>
#include <algorithm>
#include <iterator>
#include <cctype>
#include <fstream>
int main()
{
std::string words;
std::vector<std::string> dWords;
std::string inputName;
std::cin >> inputName;
ifstream file(inputName);
while(file >> words)
{
// remove punctuation
words.erase(std::remove_if(words.begin(), words.end(), [](char ch)
{ return ::ispunct(static_cast<int>(ch)); }), words.end());
dWords.push_back(words);
}
// partition D from non-D words
auto iter = std::partition(dWords.begin(), dWords.end(), [](const std::string& s)
{ return toupper(s[0]) == 'D'; });
// output results
std::cout << "The number of words starting with D: " << std::distance(dWords.begin(), iter) << "\n";
std::cout << "Here are the words:\n";
std::copy(dWords.begin(), iter, std::ostream_iterator<std::string>(std::cout, " "));
std::cout << "\n\nThe number of words not starting with D: " << std::distance(iter, dWords.end()) << "\n";
std::cout << "Here are the words:\n";
std::copy(iter, dWords.end(), std::ostream_iterator<std::string>(std::cout, " "));
}
這本質上是一個大約 4 行的程序。
1) A read of the word,
2) a filtering of the word to remove the punctuation,
3) partitioning the vector,
4) getting the count by using the partition.
以下是更改:
while(file >> words)
讀入每個單詞的循環被簡化了。 所需要的只是使用>>
循環讀取每個單詞。
使用remove_if
和ispunct
lambda 從每個單詞中刪除標點符號。 這會從單詞中刪除逗號、引號和其他符號。 完成此操作后,無需在稍后的測試中檢查"
雙引號。
words.erase(std::remove_if(words.begin(), words.end(), [](char ch)
{ return ::ispunct(static_cast<int>(ch)); }), words.end());
dWords.push_back(words);
我們將所有單詞推到向量上。 單詞是否以D
開頭並不重要。 我們稍后會處理。
將以D
開頭的單詞與不以D
開頭的單詞分開。
這是通過使用std::partition算法 function 來完成的。 這個 function 將符合某個條件的項目放在分區的左側,將不匹配的項目放在分區的右側。 返回一個迭代器,表示分區點在哪里。
在這種情況下,標准是“所有以D
或d
開頭的單詞——如果一個字符是這樣,它被放置在分區的左側。注意使用toupper
來測試d
和D
。
// partition D from non-D words
auto iter = std::partition(dWords.begin(), dWords.end(), [](const std::string& s)
{ return toupper(s[0]) == 'D'; });
獲取左右分區的項目數計數。
由於分區左側的所有項目都以D
開頭,因此只需獲取從向量開始到分區點iter
的距離即可獲得項目的計數。
同樣,為了計算不以D
開頭的單詞,我們計算從分區點iter
到向量末尾的字符:
要獲得項目的數量,我們可以使用std::distance算法 function:
// output results
std::cout << "The number of words starting with D: " << std::distance(dWords.begin(), iter) << "\n";
std::cout << "Here are the words:\n";
std::copy(dWords.begin(), iter, std::ostream_iterator<std::string>(std::cout, " "));
std::cout << "\n\nThe number of words not starting with D: " << std::distance(iter, dWords.end()) << "\n";
std::cout << "Here are the words:\n";
std::copy(iter, dWords.end(), std::ostream_iterator<std::string>(std::cout, " "));
std::copy
只是一種無需編寫循環即可輸出向量內容的奇特方式,因此不要讓您分心。
這是一個活生生的例子。 唯一的區別是使用cin
而不是文件。
如果你真的想把向量分成兩個不同的向量,一個有D
個詞,一個沒有,那么它就像從分區向量創建向量一樣簡單:
std::vector<std::string> onlyDwords(dWords.begin(), iter);
std::vector<std::string> nonDWords(iter, dWords.end());
完全避免std::vector
並使用std::map提供了一種簡潔的方法,可以將以任何字符開頭的字符串映射到以該字符開頭的單詞在給定文本塊中出現的頻率。
std::map<std::string, size_t>
提供了一種方法來將 map 唯一字符串確定為它們出現的次數。 std::string
用作唯一鍵, size_t
計數用作值。 由於 map 中的字符串是唯一的,所以只需要讀取每個單詞,檢查單詞是否以要查找的字符開頭,然后:
mymap[word]++;
讀完單詞后, mymap
將保存添加到 map 的單詞出現的頻率。 使用 map 名稱wordfreq
從文件中讀取,您可以執行以下操作:
#include <iostream>
#include <iomanip>
#include <fstream>
#include <string>
#include <cctype>
#include <map>
int main (int argc, char **argv) {
/* filename as 1st argument or use "default.txt" by default */
const char *fname = argc > 1 ? argv[1] : "default.txt"; /* filename */
const char c2find = argc > 2 ? tolower(*argv[2]) : 'd'; /* 1st char to find */
std::map<std::string, size_t> wordfreq{};
std::string word; /* string to hold each word */
std::ifstream f (fname); /* open ifstream using fname */
if (!f.is_open()) { /* validate file open for reading */
std::cerr << "error: file open failed '" << fname << "'.\n"
<< "usage: " << argv[0] << " [filename (default.txt)]\n";
return 1;
}
while (f >> word) { /* read each whitespace separate word */
if (tolower(word[0]) == c2find) { /* if word begins with char to find */
wordfreq[word]++; /* increment frequency of word in map */
}
}
for (const auto& w : wordfreq)
std::cout << std::left << std::setw(16) << w.first <<
std::right << w.second << '\n';
}
示例輸入文件
$ cat default.txt
All work and no play makes David a dull boy!
All work and no play makes David a dull boy!
All work and no play makes David a dull boy!
All work and no play makes David a dull boy!
All work and no play makes David a dull boy!
示例使用/輸出
$ ./bin/map_word_freq
David 5
dull 5
或'a'
:
$./bin/map_word_freq default.txt a
All 5
a 5
and 5
(注意:如果你想提供一個不同的字符(這是程序的第二個參數),你必須提供要在它之前讀取的文件名)
如果您還有其他問題,請仔細查看並告訴我。
在您的代碼中,您使用std::unique
將向量內的相鄰重復詞減少到 1。 在問題的正文中,您希望計算每個單詞的數量,因此在我的以下代碼版本中,我還留下了原始向量的副本,並在末尾留下了計數摘要。
正如評論部分所指出的,我還將words[1]=='d'||words[0]=='D'
更正為兩個1
,並調整了原始代碼的其他方面( std::vector::erase
需要第二個迭代器作為參數):
#include<iostream>
#include<fstream> //needed for file opening and closing/manipulation within files
#include<vector> //needed for vectors to store the words from the file
#include<algorithm> //needed for sort algorithm later
using namespace std;
int main(){
string inputName, num, words;
cout<<"Enter a valid filename: "; //Prompting user for a file name in the directory of this program exe
cin>>inputName;
ifstream file(inputName); //Creating a ifstream File which will open the file to the program
vector<string> dWords; //Creating 2 vectors, 1 for anything that starts with 'd'/'D' and 2 for anything else
vector<string> otherWords;
while(!file.eof()){ //While loop that runs until the file is eof or end of file.
getline(file, words);
while(file>>words){ //Reading each line and extracting into the words variable
if(words[0]=='d'||words[0]=='D'){ //if statement that checks if the first letter in each word starts with a 'd' or 'D'
dWords.push_back(words); //if true then the word gets added to the vector with the push_back
}
else if(words[0]=='"'){ //Checking for a niche case of when a word starts with a "
if(words[1]=='d'||words[1]=='D'){//If true then the same if statement will happen to check for 'd' or 'D' --- corrected second condition, from words[0]=='D'
dWords.push_back(words);
}
}
else{ //This case is for everything not mentioned already
otherWords.push_back(words); //This is added to a different vector than the dWords
}
}
}
// I have added 2 copies of the vectors of strings, in case you intend to count each single word, without reducing adjacent duplicates to 1 with std::unique
vector<string> original_dWords(dWords);
vector<string> original_otherWords(otherWords);
dWords.erase(unique(dWords.begin(), dWords.end()), dWords.end());
otherWords.erase(unique(otherWords.begin(), otherWords.end()), otherWords.end());
sort(dWords.begin(), dWords.end()); //Using the C++ native sorting method that works with vectors to sort alphabetically
sort(otherWords.begin(), otherWords.end());
cout<<"All words starting with D or d in the file: "<<endl; //printing out the words that start with 'd' or 'D' alphabetically
for(unsigned a=0; a<dWords.size(); a++){
cout<<dWords[a]<<endl;
}
cout<<endl;
cout<<"All words not starting with D or d in the file: "<<endl; //printing out every other word/character left
for(unsigned b=0; b<otherWords.size(); b++){
cout<<otherWords[b]<<endl;
}
// added a words count summary
cout << "Number of words beginning with d,D is: " << original_dWords.size() << endl;
cout << "If we leave just one out of consecutive, identical words, that number falls to: " << dWords.size() << endl;
cout << "Number of words not beginning with d,D is: " << original_otherWords.size() << endl;
cout << "If we leave just one out of consecutive, identical words, that number falls to: " << otherWords.size() << endl;
file.close(); //closing file after everything is done in program
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.