[英]C++ Find Word in String without Regex
I'm trying to find a certain word in a string, but find that word alone. 我正在尝试在字符串中查找某个单词,但仅查找该单词。 For example, if I had a word bank: 例如,如果我有一个单词库:
789540132143
93
3
5434
I only want a match to be found for the value 3, as the other values do not match exactly. 我只希望找到值3的匹配项,因为其他值不完全匹配。 I used the normal string::find function, but that found matches for all four values in the word bank because they all contain 3. 我使用了正常的string :: find函数,但是发现单词库中所有四个值都匹配,因为它们都包含 3。
There is no whitespace surrounding the values, and I am not allowed to use Regex. 值周围没有空格,并且不允许使用正则表达式。 I'm looking for the fastest implementation of completing this task. 我正在寻找完成此任务的最快方法。
If you want to count the words you should use a string to int map
. 如果要计算单词数,则应使用字符串将int map
为int map
。 Read a word from your file using >>
into a string then increment the map accordingly 使用>>
将文件中的单词读成字符串,然后相应地增加映射
string word;
map<string,int> count;
ifstream input("file.txt");
while (input.good()) {
input >> word;
count[word]++;
}
using >>
has the benefit that you don't have to worry about whitespace. 使用>>
的好处是您不必担心空格。
All depends on the definition of words: is it a string speparated from others with a whitespace ? 一切都取决于单词的定义:这是一个由空格分隔的字符串吗? Or are other word separators (eg coma, dot, semicolon, colon, parenntheses...) relevant as well ? 还是其他单词分隔符(例如,逗号,点,分号,冒号,括号等)也相关?
Here an accetable approach using find()
and its variant find_first_of()
: 这是一种使用find()
及其变型find_first_of()
的可加速方法:
string myline; // line to be parsed
string what="3"; // string to be found
string separator=" \t\n,;.:()[]"; // string separators
while (getline(cin, myline)) {
size_t nxt=0;
while ( (nxt=myline.find(what, nxt)) != string::npos) { // search occurences of what
if (nxt==0||separator.find(myline[nxt-1])!=string::npos) { // if at befgin of a word
size_t nsep=myline.find_first_of(separator,nxt+1); // check if goes to end of wordd
if ((nsep==string::npos && myline.length()-nxt==what.length()) || nsep-nxt==what.length()) {
cout << "Line: "<<myline<<endl; // bingo !!
cout << "from pos "<<nxt<<" to " << nsep << endl;
}
}
nxt++; // ready for next occurence
}
}
And here the online demo . 这里是在线演示 。
The principle is to check if the occurences found correspond to a word, ie are at the begin of a string or begin of a word (ie the previous char is a separator) and that it goes until the next separator (or end of line). 原理是检查找到的出现是否与单词相对应,即在字符串的开头还是单词的开头(即,前一个字符是分隔符),并且一直到下一个分隔符(或行末) 。
You can have the fastest word search function: if ou use it for solving your problem of counting words, as you've explained in your comment, you'll waste a lot of efforts ! 您可以拥有最快的单词搜索功能:如果您使用它来解决字数统计问题(如您在评论中所解释的那样),则会浪费很多精力!
The best way to achieve this would certainly be to use a map<string, int>
to store/updated a counter for each string encountered in the file. 实现此目标的最佳方法当然是使用map<string, int>
为文件中遇到的每个字符串存储/更新一个计数器。
You then just have to parse each line into words (you could use find_fisrst_of()
as suggested above) and use the map: 然后,您只需将每一行解析为单词(可以按照上面的建议使用find_fisrst_of()
)并使用地图:
mymap[word]++;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.