简体   繁体   English

Python:检查区间数所在的最快方法

[英]Python: fastest way to check in which interval number is in

Suppose I have split the interval [0, 1] into a series of smaller intervals [0, 0.2), [0.2, 0.4), [0.4, 0.9), [0.9, 1.0] .假设我已将区间[0, 1]拆分为一系列较小的区间[0, 0.2), [0.2, 0.4), [0.4, 0.9), [0.9, 1.0] Now I sample a value r in [0, 1] .现在我在[0, 1]中采样一个值r What is the fastest way I can check in which interval this belongs to using Python / Numpy / Pytorch?我可以使用 Python / Numpy / Pytorch 检查属于哪个区间的最快方法是什么?

The obvious way is this:显而易见的方法是这样的:

r = np.random.rand()
if 0 <= r < 0.2:
    pass # do something
elif 0.2 <= r < 0.4:
    pass # do something else
elif 0.4 <= r < 0.9:
    pass # do yet something else again
elif 0.9 <= r <= 1.0:
    pass # do some other thing


The bisect module contains a function bisect which uses a bisection algorithm to index into a sorted list. bisect模块包含一个 function bisect ,它使用二等分算法来索引排序列表。 This should be roughly O(log n) .这应该大约是O(log n)

from bisect import bisect

You can keep your rightmost values of your intervals in a list, and a list of functions which do something appropriate in the same order.您可以将最右边的间隔值保存在一个列表中,以及一个以相同顺序执行适当操作的函数列表。 Eg例如

def a():
    print("Do something")


intervals = [0.2, 0.4, 0.9, 1]

stufftodo = [a, a, a, a]

You can, of course, have different functions for each interval.当然,您可以为每个区间设置不同的函数。 You can then use the index returned by bisect to index into stufftodo , extract the appropriate function, and call it.然后,您可以使用 bisect 返回的索引来索引stufftodo ,提取适当的 function 并调用它。

r = np.random.rand()
stufftodo[bisect(intervals, r)]()

You'll want to first transform your list of intervals into a list of boundaries, so instead of many intervals [0, 0.2), [0.2, 0.4), [0.4, 0.9), [0.9, 1.0] , you just define:您首先需要将区间列表转换为边界列表,因此您只需定义许多区间[0, 0.2), [0.2, 0.4), [0.4, 0.9), [0.9, 1.0] ,而不是许多区间:

boundaries = [0, 0.2, 0.4, 0.9, 1.0]  # values must be sorted!!

Then you can perform a binary search over all of them, to see in which segment a value belongs:然后您可以对所有这些执行二进制搜索,以查看value属于哪个段:

index = bisect.bisect_right(boundaries, value)

index will be the index of the upper bound, so to get the range, you'd to: index将是上限的索引,因此要获取范围,您需要:

range_low = boundaries[index - 1] if index > 0 else None
range_high = boundaries[index] if index < len(boundaries) else None

This will also take care of handling values which are not in any of the intervals.这还将负责处理不在任何间隔中的值。 The binary search will be done in log(N) compares, which is the theoretical best thing you can do for arbitrary intervals.二进制搜索将在log(N)比较中完成,这是理论上你可以对任意间隔做的最好的事情。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM