简体   繁体   English

C ++在不使用正则表达式的字符串中查找单词

[英]C++ Find Word in String without Regex

I'm trying to find a certain word in a string, but find that word alone. 我正在尝试在字符串中查找某个单词,但仅查找该单词。 For example, if I had a word bank: 例如,如果我有一个单词库:

789540132143
93
3
5434

I only want a match to be found for the value 3, as the other values do not match exactly. 我只希望找到值3的匹配项,因为其他值不完全匹配。 I used the normal string::find function, but that found matches for all four values in the word bank because they all contain 3. 我使用了正常的string :: find函数,但是发现单词库中所有四个值都匹配,因为它们都包含 3。

There is no whitespace surrounding the values, and I am not allowed to use Regex. 值周围没有空格,并且不允许使用正则表达式。 I'm looking for the fastest implementation of completing this task. 我正在寻找完成此任务的最快方法。

If you want to count the words you should use a string to int map . 如果要计算单词数,则应使用字符串将int map为int map Read a word from your file using >> into a string then increment the map accordingly 使用>>将文件中的单词读成字符串,然后相应地增加映射

string word;
map<string,int> count;
ifstream input("file.txt");
while (input.good()) {
    input >> word;
    count[word]++;
}

using >> has the benefit that you don't have to worry about whitespace. 使用>>的好处是您不必担心空格。

All depends on the definition of words: is it a string speparated from others with a whitespace ? 一切都取决于单词的定义:这是一个由空格分隔的字符串吗? Or are other word separators (eg coma, dot, semicolon, colon, parenntheses...) relevant as well ? 还是其他单词分隔符(例如,逗号,点,分号,冒号,括号等)也相关?

How to parse for words without regex: 如何在不使用正则表达式的情况下解析单词:

Here an accetable approach using find() and its variant find_first_of() : 这是一种使用find()及其变型find_first_of()的可加速方法:

string myline;     // line to be parsed
string what="3";   // string to be found
string separator=" \t\n,;.:()[]";  // string separators
while (getline(cin, myline)) {
    size_t nxt=0;
    while ( (nxt=myline.find(what, nxt)) != string::npos) {  // search occurences of what
        if (nxt==0||separator.find(myline[nxt-1])!=string::npos) { // if at befgin of a word
            size_t nsep=myline.find_first_of(separator,nxt+1);   // check if goes to end of wordd
            if ((nsep==string::npos && myline.length()-nxt==what.length()) || nsep-nxt==what.length()) {
                cout << "Line: "<<myline<<endl;    // bingo !!  
                cout << "from pos "<<nxt<<" to " << nsep << endl; 
            }
        }
        nxt++;  // ready for next occurence
    }
}

And here the online demo . 这里是在线演示

The principle is to check if the occurences found correspond to a word, ie are at the begin of a string or begin of a word (ie the previous char is a separator) and that it goes until the next separator (or end of line). 原理是检查找到的出现是否与单词相对应,即在字符串的开头还是单词的开头(即,前一个字符是分隔符),并且一直到下一个分隔符(或行末) 。

How to solve your real problem: 如何解决您的实际问题:

You can have the fastest word search function: if ou use it for solving your problem of counting words, as you've explained in your comment, you'll waste a lot of efforts ! 您可以拥有最快的单词搜索功能:如果您使用它来解决字数统计问题(如您在评论中所解释的那样),则会浪费很多精力!

The best way to achieve this would certainly be to use a map<string, int> to store/updated a counter for each string encountered in the file. 实现此目标的最佳方法当然是使用map<string, int>为文件中遇到的每个字符串存储/更新一个计数器。

You then just have to parse each line into words (you could use find_fisrst_of() as suggested above) and use the map: 然后,您只需将每一行解析为单词(可以按照上面的建议使用find_fisrst_of() )并使用地图:

 mymap[word]++; 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM