[英]How to search a string for multiple substrings
I need to check a short string for matches with a list of substrings. 我需要检查一个短字符串是否与子字符串列表匹配。 Currently, I do this like shown below ( working code on ideone )
目前,我这样做如下所示( ideone上的工作代码 )
bool ContainsMyWords(const std::wstring& input)
{
if (std::wstring::npos != input.find(L"white"))
return true;
if (std::wstring::npos != input.find(L"black"))
return true;
if (std::wstring::npos != input.find(L"green"))
return true;
// ...
return false;
}
int main() {
std::wstring input1 = L"any text goes here";
std::wstring input2 = L"any text goes here black";
std::cout << "input1 " << ContainsMyWords(input1) << std::endl;
std::cout << "input2 " << ContainsMyWords(input2) << std::endl;
return 0;
}
I have 10-20 substrings that I need to match against an input. 我需要将10-20个子字符串与输入匹配。 My goal is to optimize code for CPU utilization and reduce time complexity for an average case.
我的目标是优化代码以提高CPU利用率并降低平均情况下的时间复杂度。 I receive input strings at a rate of 10 Hz, with bursts to 10 kHz (which is what I am worried about).
我接收到的输入字符串的频率为10 Hz,突发频率为10 kHz(这是我担心的)。
There is agrep library with source code written in C, I wonder if there is a standard equivalent in C++. 有一个用C语言编写的源代码的agrep库,我想知道在C ++中是否有等效的标准。 From a quick look, it may be a bit difficult (but doable) to integrate it with what I have.
快速浏览一下,将其与我现有的集成可能会有些困难(但可行)。
Is there a better way to match an input string against a set of predefined substrings in C++? 有没有更好的方法将输入字符串与C ++中的一组预定义子字符串进行匹配?
The best thing is to use a regular expression search, if you use the following regular expression: 如果使用以下正则表达式,最好的方法是使用正则表达式搜索:
"(white)|(black)|(green)"
that way, with only one pass over the string, you'll get in group 1 if a match was found for the "white"
substring (and beginning and end points), in group 2 if a match of the "black"
substring (and beginning and end points), and in group 3 if a match of the "green"
substring. 这样,只要在字符串上进行一次遍历,如果找到与
"white"
子字符串(以及起点和终点)匹配的字符串,则将进入第1组,如果与"black"
子字符串匹配,则将进入第2组(以及起点和终点),如果"green"
子字符串匹配,则在第3组中。 As you get, from group 0 the position of the end of the match, you can begin a new search to look for more matches, and everything in one pass over the string!!! 在第0组中找到比赛结束的位置,您就可以开始新的搜索以查找更多比赛,并且所有内容都将一遍传递给字符串!!!
You could use one big if, instead of several if statements. 您可以使用一个大的if语句,而不是多个if语句。 However, Nathan's Oliver solution with std::any_of is faster than that though, when making the array of the substrings
static
(so that they do not get to be recreated again and again), as shown below. 但是,将子字符串数组设为
static
时,Nathan的带有std :: any_of的Oliver解决方案要比这种解决方案快(这样就不会一次又一次地创建它们),如下所示。
bool ContainsMyWordsNathan(const std::wstring& input)
{
// do not forget to make the array static!
static std::wstring keywords[] = {L"white",L"black",L"green", ...};
return std::any_of(std::begin(keywords), std::end(keywords),
[&](const std::wstring& str){return input.find(str) != std::string::npos;});
}
PS: As discussed in Algorithm to find multiple string matches : PS:如算法中所述,以查找多个字符串匹配项 :
The "grep" family implement the multi-string search in a very efficient way. “ grep”家族以非常有效的方式实现了多字符串搜索。 If you can use them as external programs, do it.
如果您可以将它们用作外部程序,请执行此操作。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.