![](/img/trans.png)
[英]Removing all punctuation from a set of words, or writing all unique words from txt file into a set (without punctuation) in c++
[英]C++ removing punctuation from a map
我制作了一個程序,它讀取文本文檔,將其放入向量中,並將所述向量用於 map 以跟蹤文本文件中的單詞及其頻率。 所有這些都工作正常,但需要幫助從 map 中刪除標點符號。 這是我現在擁有的:
#include<iostream>
#include<string>
#include<map>
#include<algorithm>
#include<vector>
#include<fstream>
using namespace std;
void print_frequency(vector<string>&);
int main()
{
ifstream infile;
string word;
vector<string>words;
infile.open("words.txt");
if (infile.fail()) {
cerr << "Can't open file\n";
exit(1);
}
while (infile >> word) {
words.push_back(word);
}
print_frequency(words);
}
void print_frequency(vector<string>&words)
{
map<string, int>M;
for (int i = 0; i<words.size(); i++) {
if (M.find(words[i]) == M.end())
M[words[i]] = 1;
else
M[words[i]]++;
}
sort(words.begin(), words.end());
for (auto it = M.begin(); it != M.end();it++) {
if (ispunct(M[it->first]))
{
M[it->first].erase(it--, 1);
int len = M.size();
}
cout << it->first << " " << it->second << endl;
}
}
正如其他人所說,最好的主意似乎是在閱讀后立即刪除這些字符。 為此,我將 go 用於std::regex
庫:
while (infile >> word) {
std::string new_word = std::regex_replace(word, std::regex(R"([^A-Za-z\d])"), "");
words.push_back(new_word);
}
"([^A-Za-z\d])"
表示不是 ( ^
) 字母 ( A-Za-z
) 也不是數字 ( \d
) 的模式。 當然,您可以修改它以適應您的嚴格需求,我鼓勵您熟悉正則表達式語法。
是的,可能應該盡快刪除標點符號,否則以后會變得不必要的復雜。 我能夠讓它工作。
#include<iostream>
#include<string>
#include<map>
#include<algorithm>
#include<vector>
#include<fstream>
using namespace std;
void print_frequency(vector<string>&);
int main()
{
ifstream infile;
string word;
vector<string>words;
infile.open("words.txt");
if (infile.fail()) {
cerr << "Can't open file\n";
exit(1);
}
while (infile >> word) {
for (int i = 0; i < word.size(); i++)
{
if (ispunct(word[i]))
{
word.erase(i--, 1);
}
}
words.push_back(word);
}
print_frequency(words);
}
void print_frequency(vector<string>&words)
{
map<string, int>M;
for (int i = 0; i<words.size(); i++) {
if (M.find(words[i]) == M.end())
M[words[i]] = 1;
else
M[words[i]]++;
}
sort(words.begin(), words.end());
for (auto it = M.begin(); it != M.end();it++) {
cout << it->first << " " << it->second << endl;
}
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.