简体   繁体   English

如何在一个字符串中搜索多个子字符串

[英]How to search a string for multiple substrings

I need to check a short string for matches with a list of substrings. 我需要检查一个短字符串是否与子字符串列表匹配。 Currently, I do this like shown below ( working code on ideone ) 目前,我这样做如下所示( ideone上的工作代码

bool ContainsMyWords(const std::wstring& input)
{
    if (std::wstring::npos != input.find(L"white"))
        return true;
    if (std::wstring::npos != input.find(L"black"))
        return true;
    if (std::wstring::npos != input.find(L"green"))
        return true;
    // ...
    return false;
}


int main() {
  std::wstring input1 = L"any text goes here";
  std::wstring input2 = L"any text goes here black";

  std::cout << "input1 " << ContainsMyWords(input1) << std::endl;
  std::cout << "input2 " << ContainsMyWords(input2) << std::endl;
  return 0;
}

I have 10-20 substrings that I need to match against an input. 我需要将10-20个子字符串与输入匹配。 My goal is to optimize code for CPU utilization and reduce time complexity for an average case. 我的目标是优化代码以提高CPU利用率并降低平均情况下的时间复杂度。 I receive input strings at a rate of 10 Hz, with bursts to 10 kHz (which is what I am worried about). 我接收到的输入字符串的频率为10 Hz,突发频率为10 kHz(这是我担心的)。

There is agrep library with source code written in C, I wonder if there is a standard equivalent in C++. 有一个用C语言编写的源代码的agrep库,我想知道在C ++中是否有等效的标准。 From a quick look, it may be a bit difficult (but doable) to integrate it with what I have. 快速浏览一下,将其与我现有的集成可能会有些困难(但可行)。

Is there a better way to match an input string against a set of predefined substrings in C++? 有没有更好的方法将输入字符串与C ++中的一组预定义子字符串进行匹配?

The best thing is to use a regular expression search, if you use the following regular expression: 如果使用以下正则表达式,最好的方法是使用正则表达式搜索:

"(white)|(black)|(green)"

that way, with only one pass over the string, you'll get in group 1 if a match was found for the "white" substring (and beginning and end points), in group 2 if a match of the "black" substring (and beginning and end points), and in group 3 if a match of the "green" substring. 这样,只要在字符串上进行一次遍历,如果找到与"white"子字符串(以及起点和终点)匹配的字符串,则将进入第1组,如果与"black"子字符串匹配,则将进入第2组(以及起点和终点),如果"green"子字符串匹配,则在第3组中。 As you get, from group 0 the position of the end of the match, you can begin a new search to look for more matches, and everything in one pass over the string!!! 在第0组中找到比赛结束的位置,您就可以开始新的搜索以查找更多比赛,并且所有内容都将一遍传递给字符串!!!

You could use one big if, instead of several if statements. 您可以使用一个大的if语句,而不是多个if语句。 However, Nathan's Oliver solution with std::any_of is faster than that though, when making the array of the substrings static (so that they do not get to be recreated again and again), as shown below. 但是,将子字符串数组设为static时,Nathan的带有std :: any_of的Oliver解决方案要比这种解决方案快(这样就不会一次又一次地创建它们),如下所示。

bool ContainsMyWordsNathan(const std::wstring& input)
{
    // do not forget to make the array static!
    static std::wstring keywords[] = {L"white",L"black",L"green", ...};
    return std::any_of(std::begin(keywords), std::end(keywords),
      [&](const std::wstring& str){return input.find(str) != std::string::npos;});
}

PS: As discussed in Algorithm to find multiple string matches : PS:如算法中所述,以查找多个字符串匹配项

The "grep" family implement the multi-string search in a very efficient way. “ grep”家族以非常有效的方式实现了多字符串搜索。 If you can use them as external programs, do it. 如果您可以将它们用作外部程序,请执行此操作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM