简体   繁体   English

为什么std :: binary_search的参数是前向迭代器?

[英]Why arguments of std::binary_search are forward iterators?

While perusing http://en.cppreference.com/w/cpp/algorithm/binary_search I've noticed it takes forward iterator as an argument. 在阅读http://en.cppreference.com/w/cpp/algorithm/binary_search时,我注意到它将迭代器作为参数。 Now I'm confused, since I thought it would rather be an random access iterator, so the binary search will be actually binary. 现在我很困惑,因为我认为它宁愿是一个随机访问迭代器,所以二进制搜索实际上是二进制的。

To satisfy my curiosity, I've written a little program: 为了满足我的好奇心,我写了一个小程序:

#include <iostream>
#include <vector>
#include <forward_list>
#include <list>
#include <deque>
#include <algorithm>
#include <chrono>
#include <random>

int main()
{
    std::uniform_int_distribution<int> uintdistr(-4000000, 4000000);
    std::mt19937 twister(std::chrono::high_resolution_clock::to_time_t(std::chrono::high_resolution_clock::now()));
    size_t arr[] = { 200000, 400000, 800000, 1600000, 3200000, 6400000, 12800000 };
    for(auto size : arr)
    {
        std::list<int> my_list;
        for(size_t i = 0; i < size; i++)
            my_list.push_front(uintdistr(twister));
        std::chrono::time_point<std::chrono::high_resolution_clock> start, end;
        my_list.sort(); //fixed
        start = std::chrono::high_resolution_clock::now();

        std::binary_search(my_list.begin(), my_list.end(), 1252525);

        end = std::chrono::high_resolution_clock::now();
        long long unsigned elapsed_time = std::chrono::duration_cast<std::chrono::microseconds>(end-start).count();
        std::cout << "Test finished in " << elapsed_time << "\n";
    }
}

Compiling it with gcc 4.7.0 and running 用gcc 4.7.0编译并运行

g++ -std=c++11 test.cpp

provides following results on my machine: 在我的机器上提供以下结果:

Test finished in 0
Test finished in 15625
Test finished in 15625
Test finished in 46875
Test finished in 93750
Test finished in 171875
Test finished in 312500

So it looks like it doesn't actually do a binary search on a forward list. 所以看起来它实际上并没有在转发列表上进行二进制搜索。 Now my questions are: 现在我的问题是:

Why such a confusing name? 为什么这么混乱的名字?

Why does the code like this allowed? 为什么这样的代码允许?

Why does the reference says it's "Logarithmic in the distance between first and last"? 为什么参考文献说它是“第一个和最后一个之间距离的对数”?

What does the standard has to say about it? 标准有什么说法呢?

EDIT: Now the code sorts the list before search - stupid mistake, now the results are: 编辑:现在代码在搜索之前对列表进行排序 - 愚蠢的错误,现在的结果是:

Test finished in 46875
Test finished in 109375
Test finished in 265625
Test finished in 546875
Test finished in 1156250
Test finished in 2625000
Test finished in 6375000

And of course still not logarithmic ;) 当然还不是对数的;)

The docs of the original SGI STL implementation, from which the standard was derived, states that 原始SGI STL实现的文档(标准源自该文档) 指出

The number of comparisons is logarithmic: at most log(last - first) + 2. If ForwardIterator is a Random Access Iterator then the number of steps through the range is also logarithmic; 比较次数是对数的:最多log(最后 - 第一个)+ 2.如果ForwardIterator是随机访问迭代器,那么通过该范围的步数也是对数的; otherwise, the number of steps is proportional to last - first. 否则,步数与last-first成正比。

That is, the number of comparisons is always logarithmic, while the number of advancements, which are affected by the lack of random-accessibility, can be linear. 也就是说, 比较的数量总是对数的,而受缺乏随机可访问性影响的进步数量可以是线性的。 In practice, stl::advance is probably used, for which the complexity is constant if the iterator is random access, linear otherwise. 在实践中,可能使用stl::advance ,如果迭代器是随机访问,则复杂性是常量,否则是线性的。

A binary search with a linear number of iterator advancements, but with a logarithmic number of comparisons makes sense if a comparison is very expensive. 如果比较非常昂贵,那么具有线性迭代器数量增量的二分搜索,但具有对数的比较是有意义的。 If, for example, you have a sorted linked-list of complicated objects, which require disk- or network-access to compare, you're probably much better off with a binary search than with a linear one. 例如,如果您有一个复杂对象的已排序链接列表,需要磁盘或网络访问才能进行比较,那么使用二进制搜索比使用线性搜索要好得多。

Contrary to what websites may say (eg "logarithmic in distance(first, last) "), the standard actually only speaks about the comparisons (eg 25.4.3.1, lower_bound ): 网站可能会说的相反(例如“ distance(first, last)对数distance(first, last) ”),标准实际上只谈到比较 (例如25.4.3.1, lower_bound ):

Complexity: At most log2(last − first) + O(1) comparisons 复杂性:最多log2(last − first) + O(1)比较

The incrementing of the iterator is not included in the complexity! 迭代器的递增包括在复杂性中! Note though that the standard library requires all iterator increments to have amortized constant complexity, so there'll be a cost of order O(N) coming from incrementing the iterators (but presumably this has a very small leading factor). 请注意,标准库要求所有迭代器增量都具有分摊的常量复杂度,因此增加迭代器会产生订单O(N)的成本(但可能这是一个非常小的主导因素)。 In particular (25.4.3): 特别是(25.4.3):

For non-random access iterators [the algorithms] execute a linear number of steps. 对于非随机访问迭代器[算法]执行线性步数。

The standard specifies the sorted search algorithms ( std::lower_bound() , std::upper_bound() , and std::binary_search() ) to work in linear time for forward and binary iterators. 该标准指定排序的搜索算法( std::lower_bound()std::upper_bound()std::binary_search() )在前向和二进制迭代器的线性时间内工作。 For random access the time is logarithmic. 对于随机访问,时间是对数的。

Note that the number of comparisons is restricted to be logarithmic, however. 但是请注意, comparisons次数限制为对数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM