[英]Optimization: Return lowest value from array that is greater than (or equal to) `x`
EDIT: My question is different than the suggested duplicate because I already have a method of implementing lowest
. 编辑:我的问题与建议的重复项有所不同,因为我已经有了实现
lowest
的方法。 My question is not how to implement lowest
, but rather how to optimize lowest
to run faster. 我的问题不是如何实现
lowest
运行速度,而是如何优化lowest
运行速度。
Assume that I have an array a
. 假设我有一个数组
a
。 For example: 例如:
import numpy as np
a = np.array([2, 1, 3, 4, 5, 6, 7, 8, 9])
Assume that I have a float x
. 假设我有一个浮点数
x
。 For example: 例如:
x = 6.5
I want to return the lowest value in a
that is also greater than or equal to x
. 我想返回
a
也大于或等于x
的最小值。 So in this case... 所以在这种情况下...
print lowest(a, x)
>>> 7
I have tried a number of function in place of lowest
. 我尝试了一些功能来代替
lowest
的功能。 For example: 例如:
def lowest(a, x):
""" `a` should be a sorted numpy array"""
return lowest[lowest >= x][0]
def lowest(a, x):
""" `a` should be a sorted `list`, not a numpy array"""
k = sorted(a + [x])
return k[k.index(x) + 1]
However, the function lowest
remains the bottleneck of my code at ~90%. 但是,功能
lowest
仍然是我的代码的瓶颈,约为90%。
Is there a faster way to implement the function lowest
? 有没有一种更快的方法来实现
lowest
功能?
Some rules about my code: 有关我的代码的一些规则:
a
can be assumed to have a length of 10 a
可以被假设为具有10的长度 lowest
is run at least 100k times. lowest
的函数至少运行10万次。 This may be a design problem, but I am interested if there is a faster implementation of my problem first. a
can be preprocessed before being run through these loops. a
可以被通过这些循环运行之前进行预处理。 x
will vary, but a
will not. x
会有所不同,但a
不会。 a[0] <= x <= a[-1]
is always True
a[0] <= x <= a[-1]
始终为True
Here is an O(1) solution using lookup table compared to OP's (first) solution and numpy.searchsorted
. 这是使用查找表的O(1)解决方案,而
numpy.searchsorted
OP的(第一个)解决方案和numpy.searchsorted
。 It's not 100% fair because OP's solution is not vectorized. 因为OP的解决方案不是矢量化的,所以这不是100%公平的。 Anyway, timings:
无论如何,时间:
True # results equal
True # results equal
0.08163515606429428 # lookup
2.1996873939642683 # OP
0.016975965932942927 # numpy.searchsorted
For this small list size seachsorted
wins even though it is O(log n). 对于这个小的列表大小,即使它是O(log n),
seachsorted
获胜。
Code: 码:
import numpy as np
class find_next:
def __init__(self, a, max_bins=100000):
self.a = np.sort(a)
self.low = self.a[0]
self.high = self.a[-1]
self.span = self.a[-1] - self.a[0]
self.damin = np.diff(self.a).min()
if self.span // self.damin > max_bins:
raise ValueError('a too unevenly spaced for max_bins')
self.lut = np.searchsorted(self.a, np.linspace(self.low, self.high,
max_bins + 1))
self.no_bins = max_bins
def f_pp(self, x):
i = np.array((x-self.low)/self.span * self.no_bins, int)
return self.a[self.lut[i + (x > self.a[self.lut[i]])]]
def lowest(self, x):
return self.a[self.a >= x][0]
def f_ss(self, x):
return self.a[self.a.searchsorted(x)]
a = np.array([2, 1, 3, 4, 5, 6, 7, 8, 9])
x = np.random.uniform(1, 9, (10000,))
fn = find_next(a)
sol_pp = fn.f_pp(x)
sol_OP = [fn.lowest(xi) for xi in x]
sol_ss = fn.f_ss(x)
print(np.all(sol_OP == sol_pp))
print(np.all(sol_OP == sol_ss))
from timeit import timeit
kwds = dict(globals=globals(), number=10000)
print(timeit('fn.f_pp(x)', **kwds))
print(timeit('[fn.lowest(xi) for xi in x]', **kwds))
print(timeit('fn.f_ss(x)', **kwds))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.