简体   繁体   中英

Binary search in std::vector

I am trying to look for the position of vector elements into another vector. Here i am interested to use an implementation as fast as binary search . I have different vectors of length 1 million or more, so i am trying to achieve something faster.

Following situations in my case:

1) vector in which i am searching is sorted.

2) The element i am searching for will always be there ie i don't have a case of not found , and i would like to get the index of vector elements in a faster way.

I tried the following code to get the index of vector elements.

#include <iostream>
#include <vector>
#include <algorithm>

template<class Iter, class T>
Iter binary_find(Iter begin, Iter end, T val)
{
    Iter i = std::lower_bound(begin, end, val);
    return i;
}

int main() {
    std::vector<std::string> values = {"AAAAAA","AB", "AD" ,"BCD","CD", "DD" };
    std::vector<std::string> tests = {"AB", "CD","AD", "DD"};
    for(int i=0 ; i < tests.size(); i++) {
        int pos = binary_find(values.begin(), values.end(), tests.at(i))- values.begin();
    std::cout << tests.at(i) << " found at: " << pos <<std::endl;
    }
    return 0;
}  

I would like to know if the code matches with the binary search implementation.??

Is there a faster way to get the index of vector elements?

Any further suggestions to improve this code.

binary_find doesn't return anything despite not declared to return void , so it has undefined behaviour.

\n

After it is fixed, and assuming that you have no specific knowledge about the contents of the vector other than it is sorted, binary search is pretty much optimal.

There are however, other data structures that are faster for predicate based lookup than a vector. If performance is critical, you should take a look at search trees and hash maps. Since your keys are strings, tries and directed acyclic word graph in particular may be efficient. You may want to measure which is best for your use case.

http://www.cpluplus.com says that the behavior of binary_search is equivalent to:

template <class ForwardIterator, class T>
bool binary_search (ForwardIterator first, ForwardIterator last, const T& val) {
    first = std::lower_bound(first, last, val);
    return (first != last && !(val < *first));
}

So yes, lower_bound is your weapon of choice. But when you take the difference you should use distance . Cause, if there is a faster way to acquire the position it will be rolled into that function.

As far as other improvements, I'd suggest using C++14's begin and end and not calling a function which only serves to wrap a lower_bound (and fail to properly return a value.) So the way I'd write this code would look like:

auto pos = distance(lower_bound(begin(values), end(values), tests[i]), begin(values));

Q1: I would like to know if the code matches with the binary search implementation.??

Yes , it ( almost ) is. Check std::lower_bound , which states:

Complexity:

On average, logarithmic in the distance between first and last: Performs approximately log2(N)+1 element comparisons (where N is this distance). On non-random-access iterators, the iterator advances produce themselves an additional linear complexity in N on average.


Q2: Is there a faster way to get the index of vector elements.??

That's a rather broad question.


Q3: Any further suggestions to improve this code.

Hello world, Code Review !


PS - Did you even compile the code? It gives several messages, like:

warning: no return statement in function returning non-void [-Wreturn-type]

Compile with warnings enabled, like this:

g++ -Wall main.cpp

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM