简体   繁体   English

使用 std::vector 的快速搜索算法<std::string>

[英]Fast search algorithm with std::vector<std::string>

    for (std::vector<const std::string>::const_iterator it = serverList.begin(); it != serverList.end(); it++)
    {
        // found a match, store the location
        if (index == *it) // index is a string
        {
            indexResult.push_back(std::distance(serverList.begin(), it)); // std::vector<unsigned int>
        }
    }

I Have written the above code to look through a vector of strings and return another vector with the location of any "hits".我已经编写了上面的代码来查看字符串向量并返回另一个带有任何“命中”位置的向量。

Is there a way to do the same, but faster?有没有办法做同样的事情,但更快? (If I have 10,000 items in the container, it will take a while). (如果我的容器中有 10,000 件物品,则需要一段时间)。 Please note that I have to check ALL of the items for matches and store its position in the container.请注意,我必须检查所有物品是否匹配并将其位置存储在容器中。

Bonus Kudos: Anyone know any way/links on how I can make the search so that it finds partial results (Example: search for "coolro" and store the location of variable "coolroomhere")奖励荣誉:任何人都知道我如何进行搜索以找到部分结果的任何方式/链接(例如:搜索“coolro”并存储变量“coolroomhere”的位置)

Use binary_search after sorting the vector对向量排序后使用 binary_search

  1. std::sort( serverList.begin() , serverList.end() ) std::sort( serverList.begin() , serverList.end() )
  2. std::lower_bound(serverList.begin() , serverList.end() , valuetoFind) to find first matching std::lower_bound(serverList.begin() , serverList.end() , valuetoFind) 找到第一个匹配
  3. Use std::equal_range if you want to find all matching elements如果要查找所有匹配元素,请使用std::equal_range

The lower_bound & equal_range search because it is binary is logarithmic compared to your search that is O(N)O(N) 的搜索相比, lower_bound 和 equal_range搜索是二进制的,因此是对数的

Basically, you're asking if it's possible to check all elements for a match, without checking all elements.基本上,您是在询问是否可以检查所有元素是否匹配,而不检查所有元素。 If there is some sort of external meta-information (eg the data is sorted), it might be possible (eg using binary search).如果存在某种外部元信息(例如数据已排序),则可能(例如使用二进制搜索)。 Otherwise, by its very nature, to check all elements, you have to check all elements.否则,就其本质而言,要检查所有元素,您必须检查所有元素。

If you're going to do many such searches on the list and the list doesn't vary, you might consider calculating a second table with a good hash code of the entries;如果您打算在列表上进行许多此类搜索,并且列表没有变化,您可以考虑使用条目的良好哈希码计算第二个表; again depending on the type of data being looked up, it could be more efficient to calculate the hash code of the index, and compare hash codes first, only comparing the strings if the hash codes were equal.再次取决于要查找的数据类型,计算索引的哈希码并首先比较哈希码,如果哈希码相等,则仅比较字符串可能会更有效。 Whether this is an improvement or not largely depends on the size of the table and the type of data in it.这是否是一种改进在很大程度上取决于表的大小和其中的数据类型。 You might also, be able to leverage off knowledge about the data in the strings;您还可以利用有关字符串中数据的知识; if they are all URL's, for example, mostly starting with "http://www."例如,如果它们都是 URL,则大多以"http://www."开头"http://www." , starting the comparison at the tenth character, and only coming back to compare the first 10 if all of the rest are equal, could end up with a big win. ,从第 10 个字符开始比较,如果其余所有字符都相等,则仅返回比较前 10 个字符,最终可能会大获全胜。

With regards to finding substrings, you can use std::search for each element:关于查找子字符串,您可以对每个元素使用std::search

for ( auto iter = serverList.begin();
        iter != serverList.end();
        ++ iter ) {
    if ( std::search( iter->begin(), iter->end(),
                      index.begin(), index.end() ) != iter->end() ) {
        indexResult.push_back( iter - serverList.begin() );
    }
}

Depending on the number of elements being searched and the lengths of the strings involved, it might be more efficient to use something like BM search, however, precompiling the search string to the necessary tables before entering the loop.根据要搜索的元素数量和所涉及字符串的长度,使用 BM 搜索之类的方法可能更有效,但是,在进入循环之前将搜索字符串预编译为必要的表。

If you make the container a std::map instead of a std::vector , the underlying data structure used will be one that is optimized for doing keyword searches like this.如果您将容器设为std::map而不是std::vector ,则所使用的底层数据结构将是为进行此类关键字搜索而优化的数据结构。

If you instead use a std::multimap , the member function equal_range() will return a pair of iterators covering every match in the map.如果您改为使用std::multimap ,则成员函数equal_range()将返回一对覆盖映射中每个匹配项的迭代器。 That sounds to me like what you want.这听起来像你想要的。

A smart commenter below points out that if you don't actually store any more infomation than the name (the search key), then you should probably instead use a std::multiset .下面的一位聪明的评论者指出,如果您实际上没有存储比名称(搜索键)更多的信息,那么您可能应该改用std::multiset

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM