简体繁体 English

如何为最小上限搜索设计有效的算法

[英]How to design an efficient algorithm for least upper bound search

原文 2009-02-27 14:17:06 9 7 algorithm/ search

Let's say you have some set of numbers with a known lower bound and unknown upper bound, ie 0, 1, 2, 3, ... 78 where 78 is the unknown. 假设您有一组数字，其下界已知，上限未知，即0、1、2、3，... 78，其中78是未知数。 Assume for the moment there are no gaps in between numbers. 假设目前两个数字之间没有差距。 There is a time-expensive function test() that tests if a number is in the set. 有一个耗时的函数test()来测试数字是否在集合中。

What is an efficient way (requiring a low amount of test() calls) to find the highest number in the set? 什么是找到集中最大数字的有效方法（需要少量的test()调用）？

What if you have the added knowledge that the upper bound is 75 +/- 25? 如果您已知道上限为75 +/- 25，该怎么办？

What if there are random gaps between numbers in the set, ie 0, 1, 3, 4, 7, ... 78? 如果集合中数字之间存在随机间隔，即0、1、3、4、7，... 78，该怎么办？

7 个解决方案

For the "no gaps case": 对于“没有差距的情况”：

I assume that this is a fixed size of number, eg a 32 bit int 我假设这是固定的数字大小，例如32位int
We wish to find x such that test(x) == true , test(x+1) == false , right? 我们希望找到x使得test(x) == true ， test(x+1) == false ，对吗？

You basically do a binary chop between the lowest known "not in set" (eg the biggest 32 bit int) and the highest known "in set" (starting with the known lower bound) by testing the middle value in the range each time and adjusting the boundaries accordingly. 通过每次测试范围内的中间值，您基本上可以在已知的最低“未设置”（例如，最大的32位int）和最高已知的 “ in set”（从已知的下限开始）之间进行二值运算。相应地调整边界。 This would give an O(log N) solution (in terms of numbers of calls to test() ) where X is the size of the potential set, not the actual set. 这将给出一个O(log N)解决方案（根据对test()的调用次数 ），其中X是潜在集合的大小，而不是实际集合的大小。 This will be slower than just trying 1, 2, 3... for small sets, but much faster for large ones. 这将比仅尝试1、2、3 ...对于小型集合要慢，但是对于大型集合则要快得多。

All of this falls down if there can be gaps, at which point I don't think there's any feasible solution beyond "start with the absolute highest possible number and work down until test(x) == true at which point that's the highest number". 如果存在差距，所有这些都会下降，在那一点上，我认为除了“从绝对最大的可能数开始，一直进行到test(x) == true为止，没有其他可行的解决方案，那时候那是最大的数”。 Any other strategy will fail or be more expensive as far as I can see. 据我所知，任何其他策略都将失败或变得更加昂贵。

Your best bet is to simply run through the set with O(n) complexity, which is not bad. 最好的选择是简单地遍历O(n)复杂的集合，这还不错。

Take into consideration that the set is not sorted (it is a set, after all, and this is the given), each isInSet(n) operation takes O(n) as well, bringing you to O(n^2) for the entire operation, if you choose any algorithm for prodding the set at certain places... 考虑到集合未排序（毕竟是集合，这就是给定的），每个isInSet(n)操作也将O(n)设为O(n^2) 。整个操作，如果您选择任何算法在某些地方触发集合...

A much better solution, if the set is in your control, would be to simply keep a max value of the set and update it on each insertion to the set. 如果您可以控制该集合，那么更好的解决方案是仅保留该集合的最大值，并在每次插入该集合时对其进行更新。 This will be O(1) for all cases. 对于所有情况，这将是O(1) 。

Set Step to 1 设置Step 1
set Upper to Lower + Step 设置Upper到Lower + Step
if test(Upper) is true then set Lower to Upper , multiply Step by 2 and go to point 2 如果test(Upper)为true，则将Lower设置为Upper ，将Step乘以2并转到点2
at this point you know that Lower is in your set while Upper is not. 在这一点上，您知道Lower在您的集合中，而Upper不在。 You can now do a binary search between Lower and Upper to find the limit. 现在，您可以在“ Lower和“ Upper之间进行二进制搜索以找到限制。

This looks like O(log n * O(test)) complexity. 这看起来像O（log n * O（test））复杂度。

If you know that Upper is between 50 and 100, Do a binary search between these two values. 如果您知道Upper在50到100之间，请在这两个值之间进行二进制搜索。

If you have random gaps and you know that the upper bound is 100 maximum I suspect you can not do better than starting from there and testing every number one by one until test() finds a value in your set. 如果您有随机的缺口，并且您知道上限是100个最大值，我怀疑您不能做得比从那里开始并逐个测试每个数字，直到test()在您的集合中找到一个值做得更好。

If you have random gaps and you do not know an upper limit then you can never be sure you found the upper bound. 如果您有随机的缺口并且不知道上限，那么您将永远无法确定自己找到了上限。

Maybe you should just traverse through it? 也许您应该遍历它？ It would be O(n) complex. 这将是O（n）复杂的。 I think there is no other way to do this. 我认为没有其他方法可以做到这一点。

Do you know the set size, before hand? 您事先知道设定的尺寸吗？

Actually, I guess you probably don't - otherwise the first problem would be trivial. 实际上，我想您可能不会这样做-否则第一个问题将变得微不足道。

It would help if you had some idea how big the set was though. 如果您有一些想法，这会有所帮助。

Take a guess at the top value 猜测最高价值
Test - if in then increment value by some amount 测试-如果在其中，则将值增加一定量
If not in then decrease value by some amount 如果不在，则将值降低一些
Once you have upper and lower bounds for largest value, binary search till you find it (to required precision). 一旦确定了最大值的上限和下限，就进行二进制搜索，直到找到它为止（达到所需的精度）。

For the gaps you've no such ability - you can't even tell when you've found the largest element. 对于差距，您没有这种能力-您甚至无法分辨何时找到了最大的元素。 (Unless you known the maximum gap size) （除非您知道最大间隙大小）

If there are no gaps, then you are probably best off with a binary search. 如果没有差距，那么最好使用二进制搜索。

If we use the second assumption, that the top is 75 +/- 25, then are Low end is 50 and high end is 100, and our first test case is 75. If it is present, then the low end is 75 and the high end is 100, and our test case is 87. That should yield results in O( ln N) (where here N would be 50). 如果我们使用第二个假设，则顶部为75 +/- 25，则低端为50，高端为100，我们的第一个测试用例为75。如果存在，则低端为75，并且高端是100，我们的测试用例是87。这应该得出O（ln N）的结果（这里N是50）。

If we can't assume a possible upper range, we just have to made educated guess at what it might be. 如果我们不能假设可能的上限，我们只需要对可能的上限做出有根据的猜测即可。 If a value is not found, it becomes the high end. 如果找不到值，它将成为高端。 If it is found, it's the low end, and we double it to find the high end. 如果找到它，那就是低端，然后我们将其加倍以找到高端。

If there are gaps, the only way I can see of doing it is a linear search -- but even then you'll need a way of knowing when you reached the end, rather that just a big gap. 如果有差距，我唯一能看到的是线性搜索-但是即使那样，您仍然需要一种知道何时到达终点的方法，而不仅仅是一个很大的差距。

If your set happens to be the set of prime numbers, let me know when you find the biggest one. 如果您的集合恰好是质数的集合，请在找到最大数时告诉我。 I'm sure we can work something out. 我确定我们可以解决问题。 ;) ;）

But seriously, I'm guessing you know for a fact that the set does indeed have a largest value. 但是说真的，我想您知道该集合确实具有最大的价值。 Or, you're chopping it to a 32-bit integer. 或者，您将其切成32位整数。

A couple of suggestions: 一些建议：

1) Think of every case you can that would speed a result of test(x) == false. 1）考虑所有可以加快test（x）== false结果的情况。 Then you can go on to the next one. 然后，您可以继续下一个。 If the time you spend going through all of the ejection cases is far less than going through the full test, then you'll come out ahead. 如果您花费在所有弹出案例上的时间远少于通过完整测试所花费的时间，那么您将获得成功。 2) Can you gain any information from each test? 2）您可以从每次测试中获得任何信息吗？ For example, does test(x) == false imply that test(x+5679) == false as well? 例如，test（x）== false是否也暗示test（x + 5679）== false？