[英]Search in sorted array with few comparisons
You are given a std::vector<T>
of distinct items. 给你一个不同项的
std::vector<T>
。 which is already sorted. 已经排序了。 type
T
only supports less-than <
operator for comparisons. 类型
T
仅支持小于 <
运算符进行比较。 and it is a heavy function. 这是一个很重要的功能。 so you have to use it as few times as possible.
所以你必须尽可能少地使用它。
Is there any better solution than a binary search? 有没有比二分搜索更好的解决方案? If not, is there any better solution than this, that uses less-than operator fewer times?
如果没有,有没有比这更好的解决方案,使用少于运营商的次数更少?
template<typename T>
int FindKey(const std::vector<T>& list, const T& key)
{
if( list.empty() )
return -1;
int left = 0;
int right = list.size() - 1;
int mid;
while( left < right )
{
mid = (right + left) / 2;
if( list[mid] < key )
left = mid + 1;
else
right = mid;
}
if( !(key < list[left]) && !(list[left] < key) )
return left;
return -1;
}
It's not a real world situation, just a coding test. 这不是一个现实世界的情况,只是一个编码测试。
You could trade off additional O(n) preprocessing time to get amortized O(1) query time, using a hash table (eg an unordered_map
) to create a lookup table . 您可以使用哈希表 (例如,
unordered_map
)来计算额外的O(n)预处理时间以获得分摊的O(1)查询时间来创建查找表 。
Hash tables compute hash functions of the keys and do not compare the keys themselves. 散列表计算密钥的散列函数 ,不比较密钥本身。
Two keys could have the same hash, resulting in a collision , explaining why it's not guaranteed that every separate operation is constant time. 两个键可能具有相同的散列,导致冲突 ,解释了为什么不保证每个单独的操作都是恒定时间。 Amortized constant time means that if you carry out k operations that took time t in total, then the quotient t/k = O(1) , for a sufficiently large k .
摊销的常数时间意味着如果你进行总共花费时间t的 k次操作,则商t / k = O(1) ,足够大的k 。
Live example : 实例 :
#include <vector>
#include <unordered_map>
template<typename T>
class lookup {
std::unordered_map<T, int> position;
public:
lookup(const std::vector<T>& a) {
for(int i = 0; i < a.size(); ++i) position.emplace(a[i], i);
}
int operator()(const T& key) const {
auto pos = position.find(key);
return pos == position.end() ? -1 : pos->second;
}
};
This requires additional memory also. 这也需要额外的内存。
If the values can be mapped to integers and are within a reasonable range (ie max-min = O(n) ), you could simply use a vector
as a lookup table instead of unordered_map
. 如果值可以映射到整数并且在合理范围内 (即max-min = O(n) ),则可以简单地使用
vector
作为查找表而不是unordered_map
。 With the benefit of guaranteed constant query time. 具有保证不断查询时间的好处。
See also this answer to "C++ get index of element of array by value" , for a more detailed discussion, including an empirical comparison of linear, binary and hash index lookup. 另请参阅“C ++获取数组元素索引”的答案,以获得更详细的讨论,包括线性,二进制和散列索引查找的经验比较。
If the interface of type T
supports no other operations than bool operator<(L, R)
, then using the decision tree model you can prove a lower bound for comparison-based search algorithms to be Ω(log n). 如果类型
T
的接口不支持除bool operator<(L, R)
之外的其他操作,则使用决策树模型可以证明基于比较的搜索算法的下限为 Ω(log n)。
You can use std::lower_bound
. 您可以使用
std::lower_bound
。 It does it with log(n)+1
comparisons, which is the best possible complexity for your problem. 它使用
log(n)+1
比较,这是您的问题的最佳复杂性。
template<typename T>
int FindKey(const std::vector<T>& list, const T& key)
{
if(list.empty())
return -1;
typename std::vector<T>::const_iterator lb =
std::lower_bound(list.begin(), list.end(), key);
// now lb is an iterator to the first element
// which is greater or equal to key
if(key < *lb)
return -1;
else
return std::distance(list.begin(), lb);
}
With the additionnal check for equality, you do it with log(n)+2
comparisons. 通过额外检查相等性,您可以使用
log(n)+2
比较进行检查。
You can use interpolation search in log log n time if your numbers are normally distributed. 如果您的数字是正态分布的,您可以在日志日志中使用插值搜索。 If they have some other distribution, you can modify this to take your distribution into account, though I don't know which distributions yield log log time.
如果他们有其他分发,您可以修改它以考虑您的分发,但我不知道哪些分发产生日志日志时间。
https://en.wikipedia.org/wiki/Interpolation_search https://en.wikipedia.org/wiki/Interpolation_search
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.