简体   繁体   English

numpy数组的累积argmax

[英]cumulative argmax of a numpy array

Consider the array a 考虑阵列a

np.random.seed([3,1415])
a = np.random.randint(0, 10, (10, 2))
a

array([[0, 2],
       [7, 3],
       [8, 7],
       [0, 6],
       [8, 6],
       [0, 2],
       [0, 4],
       [9, 7],
       [3, 2],
       [4, 3]])

What is a vectorized way to get the cumulative argmax? 什么是获得累积argmax的矢量化方法?

array([[0, 0],  <-- both start off as max position
       [1, 1],  <-- 7 > 0 so 1st col = 1, 3 > 2 2nd col = 1
       [2, 2],  <-- 8 > 7 1st col = 2, 7 > 3 2nd col = 2
       [2, 2],  <-- 0 < 8 1st col stays the same, 6 < 7 2nd col stays the same
       [2, 2],  
       [2, 2],
       [2, 2],
       [7, 2],  <-- 9 is new max of 2nd col, argmax is now 7
       [7, 2],
       [7, 2]])

Here is a non-vectorized way to do it. 这是一种非矢量化的方法。

Notice that as the window expands, argmax applies to the growing window. 请注意,随着窗口的扩展,argmax适用于不断增长的窗口。

pd.DataFrame(a).expanding().apply(np.argmax).astype(int).values

array([[0, 0],
       [1, 1],
       [2, 2],
       [2, 2],
       [2, 2],
       [2, 2],
       [2, 2],
       [7, 2],
       [7, 2],
       [7, 2]])

Here's a vectorized pure NumPy solution that performs pretty snappily: 这是一个矢量化的纯NumPy解决方案,执行起来很漂亮:

def cumargmax(a):
    m = np.maximum.accumulate(a)
    x = np.repeat(np.arange(a.shape[0])[:, None], a.shape[1], axis=1)
    x[1:] *= m[:-1] < m[1:]
    np.maximum.accumulate(x, axis=0, out=x)
    return x

Then we have: 然后我们有:

>>> cumargmax(a)
array([[0, 0],
       [1, 1],
       [2, 2],
       [2, 2],
       [2, 2],
       [2, 2],
       [2, 2],
       [7, 2],
       [7, 2],
       [7, 2]])

Some quick testing on arrays with thousands to millions of values suggests that this is anywhere between 10-50 times faster than looping at the Python level (either implicitly or explicitly). 对具有数千到数百万个值的数组进行一些快速测试表明,这比在Python级别的循环(隐式或显式)快10-50倍。

I cant think of a way to vectorize this over both columns easily; 我想不出一种方法可以轻松地在两列上对它进行矢量化; but if the number of columns is small relative to the number of rows, that shouldn't be an issue and a for loop should suffice for that axis: 但是如果列数相对于行数较小,那应该不是问题,for循环应该足以满足该轴:

import numpy as np
import numpy_indexed as npi
a = np.random.randint(0, 10, (10))
max = np.maximum.accumulate(a)
idx = npi.indices(a, max)
print(idx)

I would like to make a function that computes cumulative argmax for 1d array and then apply it to all columns. 我想创建一个函数来计算1d数组的累积argmax,然后将其应用于所有列。 This is the code: 这是代码:

import numpy as np

np.random.seed([3,1415])
a = np.random.randint(0, 10, (10, 2))

def cumargmax(v):
    uargmax = np.frompyfunc(lambda i, j: j if v[j] > v[i] else i, 2, 1)
    return uargmax.accumulate(np.arange(0, len(v)), 0, dtype=np.object).astype(v.dtype)

np.apply_along_axis(cumargmax, 0, a)

The reason for converting to np.object and then converting back is a workaround for Numpy 1.9, as mentioned in generalized cumulative functions in NumPy/SciPy? 转换为np.object然后转换回来的原因是Numpy 1.9的解决方法,正如NumPy / SciPy中的广义累积函数中所提到的那样

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM