简体   繁体   English

优化:从数组中返回大于(或等于)x的最小值

[英]Optimization: Return lowest value from array that is greater than (or equal to) `x`

EDIT: My question is different than the suggested duplicate because I already have a method of implementing lowest . 编辑:我的问题与建议的重复项有所不同,因为我已经有了实现lowest的方法。 My question is not how to implement lowest , but rather how to optimize lowest to run faster. 我的问题不是如何实现lowest运行速度,而是如何优化lowest运行速度。

Assume that I have an array a . 假设我有一个数组a For example: 例如:

import numpy as np
a = np.array([2, 1, 3, 4, 5, 6, 7, 8, 9])

Assume that I have a float x . 假设我有一个浮点数x For example: 例如:

x = 6.5

I want to return the lowest value in a that is also greater than or equal to x . 我想返回a也大于或等于x的最小值。 So in this case... 所以在这种情况下...

print lowest(a, x)
>>> 7

I have tried a number of function in place of lowest . 我尝试了一些功能来代替lowest的功能。 For example: 例如:

def lowest(a, x):
""" `a` should be a sorted numpy array"""
    return lowest[lowest >= x][0]

def lowest(a, x):
""" `a` should be a sorted `list`, not a numpy array"""
    k = sorted(a + [x])
    return k[k.index(x) + 1]

However, the function lowest remains the bottleneck of my code at ~90%. 但是,功能lowest仍然是我的代码的瓶颈,约为90%。

Is there a faster way to implement the function lowest ? 有没有一种更快的方法来实现lowest功能?

Some rules about my code: 有关我的代码的一些规则:

  • a can be assumed to have a length of 10 a可以被假设为具有10的长度
  • the function lowest is run at least 100k times. 功能lowest的函数至少运行10万次。 This may be a design problem, but I am interested if there is a faster implementation of my problem first. 这可能是一个设计问题,但是我对是否可以更快地实现我的问题感兴趣。
  • a can be preprocessed before being run through these loops. a可以被通过这些循环运行之前进行预处理。 x will vary, but a will not. x会有所不同,但a不会。
  • It can be assumed that a[0] <= x <= a[-1] is always True 可以假设a[0] <= x <= a[-1]始终为True

Here is an O(1) solution using lookup table compared to OP's (first) solution and numpy.searchsorted . 这是使用查找表的O(1)解决方案,而numpy.searchsorted OP的(第一个)解决方案和numpy.searchsorted It's not 100% fair because OP's solution is not vectorized. 因为OP的解决方案不是矢量化的,所以这不是100%公平的。 Anyway, timings: 无论如何,时间:

True                  # results equal
True                  # results equal
0.08163515606429428   # lookup
2.1996873939642683    # OP
0.016975965932942927  # numpy.searchsorted

For this small list size seachsorted wins even though it is O(log n). 对于这个小的列表大小,即使它是O(log n), seachsorted获胜。

Code: 码:

import numpy as np

class find_next:
    def __init__(self, a, max_bins=100000):
        self.a = np.sort(a)
        self.low = self.a[0]
        self.high = self.a[-1]
        self.span = self.a[-1] - self.a[0]
        self.damin = np.diff(self.a).min()
        if self.span // self.damin > max_bins:
            raise ValueError('a too unevenly spaced for max_bins')
        self.lut = np.searchsorted(self.a, np.linspace(self.low, self.high,
                                                       max_bins + 1))
        self.no_bins = max_bins
    def f_pp(self, x):
        i = np.array((x-self.low)/self.span * self.no_bins, int)
        return self.a[self.lut[i + (x > self.a[self.lut[i]])]]
    def lowest(self, x):
        return self.a[self.a >= x][0]
    def f_ss(self, x):
        return self.a[self.a.searchsorted(x)]

a = np.array([2, 1, 3, 4, 5, 6, 7, 8, 9])

x = np.random.uniform(1, 9, (10000,))

fn = find_next(a)
sol_pp = fn.f_pp(x)
sol_OP = [fn.lowest(xi) for xi in x]
sol_ss = fn.f_ss(x)

print(np.all(sol_OP == sol_pp))
print(np.all(sol_OP == sol_ss))

from timeit import timeit
kwds = dict(globals=globals(), number=10000)

print(timeit('fn.f_pp(x)', **kwds))
print(timeit('[fn.lowest(xi) for xi in x]', **kwds))
print(timeit('fn.f_ss(x)', **kwds))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Gurobipy 优化:约束使变量值大于 100 或等于 0 - Gurobipy Optimization: Constraint to make variable value to be greater than 100 OR equal to 0 返回大于x的数组值 - Return array values that are greater than x 从num_dict返回值大于或等于min_cutoff的所有键(按设置) - Return all the keys (as set) from num_dict that have value greater than or equal to min_cutoff 从 pandas dataframe 中提取至少一个值大于或等于数组值的行 - Extract rows from pandas dataframe with at least one value greater than or equal to values from array 如何返回大于 x 的最后一个元素的值 - How to return the value for last element greater than x Python,如何找到numpy数组中大于指定值的最低元素的列索引 - Python, how to find the column index of the lowest element greater than a specified value in a numpy array 在没有一堆if语句的情况下从大于、小于和等于中获取布尔值? - Getting Boolean value from greater than, less than, and equal to without a bunch of if statements? 如果前一个日期的特定列的值大于零则为真,小于或等于零为假 - If the value of a specific column from a previous date is greater than zero then True, less than or equal to zero is False 熊猫:在groupby组中,如果最大值比任何其他值大至少3倍,则返回最大值 - Pandas: within groupby groups, return max value if it is at least 3x greater than any other value 如果值大于 x,则从列表中选择所有值,直到值大于 y。 将所有其他值设为 0 - If value is greater than x select all values from list until value is greater than y. Make all other values 0
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM