C ++在不使用正则表达式的字符串中查找单词

Question

I'm trying to find a certain word in a string, but find that word alone. 我正在尝试在字符串中查找某个单词，但仅查找该单词。 For example, if I had a word bank: 例如，如果我有一个单词库：

789540132143
93
3
5434

I only want a match to be found for the value 3, as the other values do not match exactly. 我只希望找到值3的匹配项，因为其他值不完全匹配。 I used the normal string::find function, but that found matches for all four values in the word bank because they all contain 3. 我使用了正常的string :: find函数，但是发现单词库中所有四个值都匹配，因为它们都包含 3。

There is no whitespace surrounding the values, and I am not allowed to use Regex. 值周围没有空格，并且不允许使用正则表达式。 I'm looking for the fastest implementation of completing this task. 我正在寻找完成此任务的最快方法。

Answer 1

If you want to count the words you should use a string to int map . 如果要计算单词数，则应使用字符串将int map为int map 。 Read a word from your file using >> into a string then increment the map accordingly 使用>>将文件中的单词读成字符串，然后相应地增加映射

string word;
map<string,int> count;
ifstream input("file.txt");
while (input.good()) {
    input >> word;
    count[word]++;
}

using >> has the benefit that you don't have to worry about whitespace. 使用>>的好处是您不必担心空格。

Answer 2

All depends on the definition of words: is it a string speparated from others with a whitespace ? 一切都取决于单词的定义：这是一个由空格分隔的字符串吗？ Or are other word separators (eg coma, dot, semicolon, colon, parenntheses...) relevant as well ? 还是其他单词分隔符（例如，逗号，点，分号，冒号，括号等）也相关？

How to parse for words without regex: 如何在不使用正则表达式的情况下解析单词：

Here an accetable approach using find() and its variant find_first_of() : 这是一种使用find()及其变型find_first_of()的可加速方法：

string myline;     // line to be parsed
string what="3";   // string to be found
string separator=" \t\n,;.:()[]";  // string separators
while (getline(cin, myline)) {
    size_t nxt=0;
    while ( (nxt=myline.find(what, nxt)) != string::npos) {  // search occurences of what
        if (nxt==0||separator.find(myline[nxt-1])!=string::npos) { // if at befgin of a word
            size_t nsep=myline.find_first_of(separator,nxt+1);   // check if goes to end of wordd
            if ((nsep==string::npos && myline.length()-nxt==what.length()) || nsep-nxt==what.length()) {
                cout << "Line: "<<myline<<endl;    // bingo !!  
                cout << "from pos "<<nxt<<" to " << nsep << endl; 
            }
        }
        nxt++;  // ready for next occurence
    }
}

And here the online demo . 这里是在线演示。

The principle is to check if the occurences found correspond to a word, ie are at the begin of a string or begin of a word (ie the previous char is a separator) and that it goes until the next separator (or end of line). 原理是检查找到的出现是否与单词相对应，即在字符串的开头还是单词的开头（即，前一个字符是分隔符），并且一直到下一个分隔符（或行末）。

How to solve your real problem: 如何解决您的实际问题：

You can have the fastest word search function: if ou use it for solving your problem of counting words, as you've explained in your comment, you'll waste a lot of efforts ! 您可以拥有最快的单词搜索功能：如果您使用它来解决字数统计问题（如您在评论中所解释的那样），则会浪费很多精力！

The best way to achieve this would certainly be to use a map<string, int> to store/updated a counter for each string encountered in the file. 实现此目标的最佳方法当然是使用map<string, int>为文件中遇到的每个字符串存储/更新一个计数器。

You then just have to parse each line into words (you could use find_fisrst_of() as suggested above) and use the map: 然后，您只需将每一行解析为单词（可以按照上面的建议使用find_fisrst_of() ）并使用地图：

 mymap[word]++;

C ++在不使用正则表达式的字符串中查找单词

问题描述

2 个解决方案

解决方案1
1 2015-11-08 22:30:27

解决方案2
0 2015-11-08 22:16:48

How to parse for words without regex: 如何在不使用正则表达式的情况下解析单词：

How to solve your real problem: 如何解决您的实际问题：

C ++在不使用正则表达式的字符串中查找单词

问题描述

2 个解决方案

解决方案1 1 2015-11-08 22:30:27

解决方案2 0 2015-11-08 22:16:48

How to parse for words without regex: 如何在不使用正则表达式的情况下解析单词：

How to solve your real problem: 如何解决您的实际问题：

解决方案1
1 2015-11-08 22:30:27

解决方案2
0 2015-11-08 22:16:48