优化：从数组中返回大于（或等于）x的最小值

Question

EDIT: My question is different than the suggested duplicate because I already have a method of implementing lowest . 编辑：我的问题与建议的重复项有所不同，因为我已经有了实现lowest的方法。 My question is not how to implement lowest , but rather how to optimize lowest to run faster. 我的问题不是如何实现lowest运行速度，而是如何优化lowest运行速度。

Assume that I have an array a . 假设我有一个数组a 。 For example: 例如：

import numpy as np
a = np.array([2, 1, 3, 4, 5, 6, 7, 8, 9])

Assume that I have a float x . 假设我有一个浮点数x 。 For example: 例如：

x = 6.5

I want to return the lowest value in a that is also greater than or equal to x . 我想返回a也大于或等于x的最小值。 So in this case... 所以在这种情况下...

print lowest(a, x)
>>> 7

I have tried a number of function in place of lowest . 我尝试了一些功能来代替lowest的功能。 For example: 例如：

def lowest(a, x):
""" `a` should be a sorted numpy array"""
    return lowest[lowest >= x][0]

def lowest(a, x):
""" `a` should be a sorted `list`, not a numpy array"""
    k = sorted(a + [x])
    return k[k.index(x) + 1]

However, the function lowest remains the bottleneck of my code at ~90%. 但是，功能lowest仍然是我的代码的瓶颈，约为90％。

Is there a faster way to implement the function lowest ? 有没有一种更快的方法来实现lowest功能？

Some rules about my code: 有关我的代码的一些规则：

a can be assumed to have a length of 10 a可以被假设为具有10的长度
the function lowest is run at least 100k times. 功能lowest的函数至少运行10万次。 This may be a design problem, but I am interested if there is a faster implementation of my problem first. 这可能是一个设计问题，但是我对是否可以更快地实现我的问题感兴趣。
a can be preprocessed before being run through these loops. a可以被通过这些循环运行之前进行预处理。 x will vary, but a will not. x会有所不同，但a不会。
It can be assumed that a[0] <= x <= a[-1] is always True 可以假设a[0] <= x <= a[-1]始终为True

Answer 1

Here is an O(1) solution using lookup table compared to OP's (first) solution and numpy.searchsorted . 这是使用查找表的O（1）解决方案，而numpy.searchsorted OP的（第一个）解决方案和numpy.searchsorted 。 It's not 100% fair because OP's solution is not vectorized. 因为OP的解决方案不是矢量化的，所以这不是100％公平的。 Anyway, timings: 无论如何，时间：

True                  # results equal
True                  # results equal
0.08163515606429428   # lookup
2.1996873939642683    # OP
0.016975965932942927  # numpy.searchsorted

For this small list size seachsorted wins even though it is O(log n). 对于这个小的列表大小，即使它是O（log n）， seachsorted获胜。

Code: 码：

import numpy as np

class find_next:
    def __init__(self, a, max_bins=100000):
        self.a = np.sort(a)
        self.low = self.a[0]
        self.high = self.a[-1]
        self.span = self.a[-1] - self.a[0]
        self.damin = np.diff(self.a).min()
        if self.span // self.damin > max_bins:
            raise ValueError('a too unevenly spaced for max_bins')
        self.lut = np.searchsorted(self.a, np.linspace(self.low, self.high,
                                                       max_bins + 1))
        self.no_bins = max_bins
    def f_pp(self, x):
        i = np.array((x-self.low)/self.span * self.no_bins, int)
        return self.a[self.lut[i + (x > self.a[self.lut[i]])]]
    def lowest(self, x):
        return self.a[self.a >= x][0]
    def f_ss(self, x):
        return self.a[self.a.searchsorted(x)]

a = np.array([2, 1, 3, 4, 5, 6, 7, 8, 9])

x = np.random.uniform(1, 9, (10000,))

fn = find_next(a)
sol_pp = fn.f_pp(x)
sol_OP = [fn.lowest(xi) for xi in x]
sol_ss = fn.f_ss(x)

print(np.all(sol_OP == sol_pp))
print(np.all(sol_OP == sol_ss))

from timeit import timeit
kwds = dict(globals=globals(), number=10000)

print(timeit('fn.f_pp(x)', **kwds))
print(timeit('[fn.lowest(xi) for xi in x]', **kwds))
print(timeit('fn.f_ss(x)', **kwds))

优化：从数组中返回大于（或等于）x的最小值

问题描述

1 个解决方案

解决方案1
2 已采纳 2018-03-01 20:02:02

优化：从数组中返回大于（或等于）x的最小值

问题描述

1 个解决方案

解决方案1 2 已采纳 2018-03-01 20:02:02

解决方案1
2 已采纳 2018-03-01 20:02:02