简体繁体 English

快速字符串搜索？

[英]Fast string search?

原文 2013-02-05 21:08:38 1 4 c++/ string/ performance/ search/ vector

I have a vector of strings and have to check if each element in vector is present in a given list of 5000 words. 我有一个字符串向量，必须检查向量中的每个元素是否存在于5000个单词的给定列表中。 Besides the mundane method of two nested loops, is there any faster way to do this in C++? 除了两个嵌套循环的普通方法之外，有没有更快的方法在C ++中执行此操作？

4 个解决方案

You should put the list of strings into an std::set . 您应该将字符串列表放入std :: set 。 It's a data structure optimized for searching. 它是为搜索而优化的数据结构。 Finding if a given element is in the set or not is an operation which is much faster than iterating all entries. 查找给定元素是否在集合中是一种比迭代所有条目快得多的操作。

When you are already using C++11, you can also use the std::unordered_set which is even faster for lookup, because it's implemented as a hash table. 当你已经在使用C ++ 11时，你也可以使用std :: unordered_set ，它更快地进行查找，因为它是作为哈希表实现的。

Should this be for school/university: Be prepared to explain how these data structures manage to be faster. 这应该适用于学校/大学：准备好解释这些数据结构如何变得更快。 When your instructor asks you to explain why you used them, "some guys on the internet told me" is unlikely to earn you a sticker in the class book. 当你的导师要求你解释你使用它们的原因时，“互联网上的一些人告诉我”不太可能在课本上给你一个贴纸。

You could put the list of words in an std::unordered_set . 您可以将单词列表放在std :: unordered_set中。 Then, for each element in the vector, you just have to test if it is in the unordered_set in O(1). 然后，对于向量中的每个元素，您只需要测试它是否在O（1）中的unordered_set中。 You would have an expected complexity of O(n) (look at the comment to see why it is only expected). 你会有一个预期的复杂性O（n）（看看评论，看看为什么它只是预期）。

你可以对矢量进行排序，然后你可以用一个“循环”解决这个问题（你的字典也被排序），这意味着O（n）不计入排序成本。

So you have a vector of strings, with each string having one or more words, and you have a vector that's a dictionary, and you're supposed to determine which words in the vector of strings are also in the dictionary? 所以你有一个字符串向量，每个字符串都有一个或多个单词，你有一个字典的向量，你应该确定字符串向量中的哪些单词也在字典中？ The vector of strings is an annoyance, since you need to look at each word. 字符串向量是一个烦恼，因为你需要查看每个单词。 I'd start by creating a new vector, splitting each string into words, and pushing each word into the new vector. 我首先创建一个新的向量，将每个字符串分成单词，然后将每个单词推入新的向量。 Then sort the new vector and run it through the std::unique algorithm to eliminate duplicates. 然后对新向量进行排序并通过std::unique算法运行它以消除重复。 Then sort the dictionary. 然后对字典进行排序。 Then run both ranges through std::set_intersection to write the result. 然后通过std::set_intersection运行两个范围来写入结果。