Python：沿着特定维度查找最大数组索引，该索引大于阈值

Question

Lets say I have a 4-D numpy array (ex: np.rand((x,y,z,t)) ) of data with dimensions corresponding to X,Y,Z, and time. 可以说我有一个4-D numpy数组（例如： np.rand((x,y,z,t)) ），其维度对应于X，Y，Z和时间。

For each X and Y point, and at each time step, I want to find the largest index in Z for which the data is larger than some threshold n . 对于每个X和Y点，以及每个时间步长，我想在Z中找到数据大于某个阈值n的最大索引。

So my end result should be an X-by-Y-by-t array. 所以我的最终结果应该是X-by-Y-by-t数组。 Instances where there are no values in the Z-column greater than the threshold should be represented by a 0. Z列中没有大于阈值的实例应以0表示。

I can loop through element-by-element and construct a new array as I go, however I am operating on a very large array and it takes too long. 我可以逐个元素地循环遍历并构造一个新的数组，但是我在一个非常大的数组上进行操作，它花费的时间太长。

Answer 1

Unfortunately, following the example of Python builtins, numpy doesn't make it easy to get the last index, although the first is trivial. 不幸的是，按照Python内置函数的示例，尽管第一个索引很琐碎，但numpy并不容易获得最后一个索引。 Still, something like 还是这样

def slow(arr, axis, threshold):
    return (arr > threshold).cumsum(axis=axis).argmax(axis=axis)

def fast(arr, axis, threshold):
    compare = (arr > threshold)
    reordered = compare.swapaxes(axis, -1)
    flipped = reordered[..., ::-1]
    first_above = flipped.argmax(axis=-1)
    last_above = flipped.shape[-1] - first_above - 1
    are_any_above = compare.any(axis=axis)
    # patch the no-matching-element found values
    patched = np.where(are_any_above, last_above, 0)
    return patched

gives me 给我

In [14]: arr = np.random.random((100,100,30,50))

In [15]: %timeit a = slow(arr, axis=2, threshold=0.75)
1 loop, best of 3: 248 ms per loop

In [16]: %timeit b = fast(arr, axis=2, threshold=0.75)
10 loops, best of 3: 50.9 ms per loop

In [17]: (slow(arr, axis=2, threshold=0.75) == fast(arr, axis=2, threshold=0.75)).all()
Out[17]: True

(There's probably a slicker way to do the flipping but it's the end of day here and my brain is shutting down. :-) （可能有一种比较轻松的方式来进行翻转，但这已经是一天的结束了，我的大脑正在关闭。:-)

Answer 2

Here's a faster approach - 这是一种更快的方法-

def faster(a,n,invalid_specifier):
    mask = a>n    
    idx = a.shape[2] - (mask[:,:,::-1]).argmax(2) - 1
    idx[~mask[:,:,-1] & (idx == a.shape[2]-1)] = invalid_specifier  
    return idx

Runtime test - 运行时测试-

# Using @DSM's benchmarking setup

In [553]: a = np.random.random((100,100,30,50))
     ...: n = 0.75
     ...: 

In [554]: out1 = faster(a,n,invalid_specifier=0)
     ...: out2 = fast(a, axis=2, threshold=n) # @DSM's soln
     ...: 

In [555]: np.allclose(out1,out2)
Out[555]: True

In [556]: %timeit fast(a, axis=2, threshold=n)  # @DSM's soln
10 loops, best of 3: 64.6 ms per loop

In [557]: %timeit faster(a,n,invalid_specifier=0)
10 loops, best of 3: 43.7 ms per loop

Python：沿着特定维度查找最大数组索引，该索引大于阈值

问题描述

2 个解决方案

解决方案1
3 已采纳 2017-01-06 23:02:58

解决方案2
2 2017-01-07 03:26:32

Python：沿着特定维度查找最大数组索引，该索引大于阈值

问题描述

2 个解决方案

解决方案1 3 已采纳 2017-01-06 23:02:58

解决方案2 2 2017-01-07 03:26:32

解决方案1
3 已采纳 2017-01-06 23:02:58

解决方案2
2 2017-01-07 03:26:32