简体   繁体   中英

Optimization: Return lowest value from array that is greater than (or equal to) `x`

EDIT: My question is different than the suggested duplicate because I already have a method of implementing lowest . My question is not how to implement lowest , but rather how to optimize lowest to run faster.

Assume that I have an array a . For example:

import numpy as np
a = np.array([2, 1, 3, 4, 5, 6, 7, 8, 9])

Assume that I have a float x . For example:

x = 6.5

I want to return the lowest value in a that is also greater than or equal to x . So in this case...

print lowest(a, x)
>>> 7

I have tried a number of function in place of lowest . For example:

def lowest(a, x):
""" `a` should be a sorted numpy array"""
    return lowest[lowest >= x][0]

def lowest(a, x):
""" `a` should be a sorted `list`, not a numpy array"""
    k = sorted(a + [x])
    return k[k.index(x) + 1]

However, the function lowest remains the bottleneck of my code at ~90%.

Is there a faster way to implement the function lowest ?

Some rules about my code:

  • a can be assumed to have a length of 10
  • the function lowest is run at least 100k times. This may be a design problem, but I am interested if there is a faster implementation of my problem first.
  • a can be preprocessed before being run through these loops. x will vary, but a will not.
  • It can be assumed that a[0] <= x <= a[-1] is always True

Here is an O(1) solution using lookup table compared to OP's (first) solution and numpy.searchsorted . It's not 100% fair because OP's solution is not vectorized. Anyway, timings:

True                  # results equal
True                  # results equal
0.08163515606429428   # lookup
2.1996873939642683    # OP
0.016975965932942927  # numpy.searchsorted

For this small list size seachsorted wins even though it is O(log n).

Code:

import numpy as np

class find_next:
    def __init__(self, a, max_bins=100000):
        self.a = np.sort(a)
        self.low = self.a[0]
        self.high = self.a[-1]
        self.span = self.a[-1] - self.a[0]
        self.damin = np.diff(self.a).min()
        if self.span // self.damin > max_bins:
            raise ValueError('a too unevenly spaced for max_bins')
        self.lut = np.searchsorted(self.a, np.linspace(self.low, self.high,
                                                       max_bins + 1))
        self.no_bins = max_bins
    def f_pp(self, x):
        i = np.array((x-self.low)/self.span * self.no_bins, int)
        return self.a[self.lut[i + (x > self.a[self.lut[i]])]]
    def lowest(self, x):
        return self.a[self.a >= x][0]
    def f_ss(self, x):
        return self.a[self.a.searchsorted(x)]

a = np.array([2, 1, 3, 4, 5, 6, 7, 8, 9])

x = np.random.uniform(1, 9, (10000,))

fn = find_next(a)
sol_pp = fn.f_pp(x)
sol_OP = [fn.lowest(xi) for xi in x]
sol_ss = fn.f_ss(x)

print(np.all(sol_OP == sol_pp))
print(np.all(sol_OP == sol_ss))

from timeit import timeit
kwds = dict(globals=globals(), number=10000)

print(timeit('fn.f_pp(x)', **kwds))
print(timeit('[fn.lowest(xi) for xi in x]', **kwds))
print(timeit('fn.f_ss(x)', **kwds))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM