简体   繁体   English

快速查找 Python 列表中 i 之前所有元素最大值的方法

[英]Fast way to find the maximum of all elements before i in a list in Python

I have an array, X, which I want to make monotonic.我有一个数组 X,我想让它成为单调的。 Specifically, I want to do具体来说,我想做

y = x.copy()    
for i in range(1, len(x)):
    y[i] = np.max(x[:i])

This is extremely slow for large arrays, but it feels like there should be a more efficient way of doing this.这对于大 arrays 来说非常慢,但感觉应该有更有效的方法来做到这一点。 How can this operation be sped up?如何加快此操作?

The OP implementation is very inefficient because it does not use the information acquired on the previous iteration, resulting in O(n²) complexity. OP 实现效率非常低,因为它不使用在前一次迭代中获取的信息,导致O(n²)复杂度。

def max_acc_OP(arr):
    result = np.empty_like(arr)
    for i in range(len(arr)):
        result[i] = np.max(arr[:i + 1])
    return result

Note that I fixed the OP code (which was otherwise throwing a ValueError: zero-size array to reduction operation maximum which has no identity ) by allowing to get the largest value among those up to position i included.请注意,我修复了 OP 代码(否则会抛出ValueError: zero-size array to reduction operation maximum which has no identity )通过允许在i包含的 position 中获取最大值。

It is easy to adapt that so that values at position i are excluded, but it leaves the first value of the result undefined, and it would never use the last value of the input.很容易调整,以便排除 position i处的值,但它会使结果的第一个值未定义,并且它永远不会使用输入的最后一个值。 The first value of the result can be taken to be equal to the first value of the input, eg:结果的第一个值可以取等于输入的第一个值,例如:

def max_acc2_OP(arr):
    result = np.empty_like(arr)
    result[0] = arr[0]  # uses first value of input
    for i in range(1, len(arr) + 1):
        result[i] = np.max(arr[:i])
    return result

It is equally easy to have similar adaptations for the code below, and I do not think it is particularly relevant to cover both cases of the value at position i included and excluded.对下面的代码进行类似的改编同样容易,我认为涵盖 position i包含和排除的两种情况并不是特别相关。 Henceforth, only the "included" case is covered.此后,仅涵盖“包含”案例。

Back to the efficiency of the solotion, if you keep track of the current maximum and use that to fill your output array instead of re-computing the maximum for all value up to i at each iteration, you can easily get to O(n) complexity:回到 solotion 的效率,如果你跟踪当前的最大值并使用它来填充你的 output 数组而不是在每次迭代时重新计算所有值的最大值,直到i ,你可以很容易地得到O(n)复杂:

def max_acc(arr):
    result = np.empty_like(arr)
    curr_max = arr[0]
    for i, x in enumerate(arr):
        if x > curr_max:
            curr_max = x
        result[i] = curr_max
    return result

However, this is still relatively slow because of the explicit looping.但是,由于显式循环,这仍然相对较慢。 Luckily, one can either rewrite this in vectorized form combining np.fmax() (or np.maximum() -- depending on how you need NaNs to be handled) and np.ufunc.accumulate() :幸运的是,可以结合np.fmax() (或np.maximum() —— 取决于你需要如何处理 NaN)和np.ufunc.accumulate()以矢量化形式重写它:

np.fmax.accumulate()

# or

np.maximum.accumulate()

or, accelerating the solution above with Numba:或者,使用 Numba 加速上述解决方案:

max_acc_nb = nb.njit(max_acc)

Some timings on relatively large inputs are provided below:下面提供了一些相对较大输入的时序:

n = 10000
arr = np.random.randint(0, n, n)
%timeit -n 4 -r 4 max_acc_OP(arr)
# 97.5 ms ± 14.2 ms per loop (mean ± std. dev. of 4 runs, 4 loops each)
%timeit -n 4 -r 4 np.fmax.accumulate(arr)
# 112 µs ± 134 µs per loop (mean ± std. dev. of 4 runs, 4 loops each)
%timeit -n 4 -r 4 np.maximum.accumulate(arr)
# 88.4 µs ± 107 µs per loop (mean ± std. dev. of 4 runs, 4 loops each)
%timeit -n 4 -r 4 max_acc(arr)
# 2.32 ms ± 146 µs per loop (mean ± std. dev. of 4 runs, 4 loops each)
%timeit -n 4 -r 4 max_acc_nb(arr)
# 9.11 µs ± 3.01 µs per loop (mean ± std. dev. of 4 runs, 4 loops each)

indicating that max_acc() is already much faster than max_acc_OP() , but np.maximum.accumulate() / np.fmax.accumulate() is even faster, and max_acc_nb() comes out as the fastest.表明max_acc()已经比max_acc_OP()快得多,但np.maximum.accumulate() / np.fmax.accumulate()甚至更快,并且max_acc_nb()是最快的。 As always, it is important to take these kind of numbers with a grain of salt.一如既往,重要的是要对这些数字持保留态度。

I think it will work faster to just keep track of the maximum rather than calculating it each time for each sub-array我认为只跟踪最大值而不是每次为每个子数组计算它会更快

y = x.copy()    
_max = y[0]
for i in range(1, len(x)):
    y[i] = _max
    _max = max(x[i], _max)

you can use list comprehension for it.您可以对其使用列表理解。 but you need to start your loop from 1 not from 0. either you can use like that if you want loop from 0.但是你需要从 1 而不是从 0 开始循环。如果你想从 0 开始循环,你也可以这样使用。

y=[np.max(x[:i+1]) for i in range(len(x))]

or like that或那样

y=[np.max(x[:i]) for i in range(1,len(x)+1)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM