[英]Python: Find largest array index along a specific dimension which is greater than a threshold
Lets say I have a 4-D numpy array (ex: np.rand((x,y,z,t))
) of data with dimensions corresponding to X,Y,Z, and time. 可以说我有一个4-D numpy数组(例如:
np.rand((x,y,z,t))
),其维度对应于X,Y,Z和时间。
For each X and Y point, and at each time step, I want to find the largest index in Z for which the data is larger than some threshold n
. 对于每个X和Y点,以及每个时间步长,我想在Z中找到数据大于某个阈值
n
的最大索引。
So my end result should be an X-by-Y-by-t array. 所以我的最终结果应该是X-by-Y-by-t数组。 Instances where there are no values in the Z-column greater than the threshold should be represented by a 0.
Z列中没有大于阈值的实例应以0表示。
I can loop through element-by-element and construct a new array as I go, however I am operating on a very large array and it takes too long. 我可以逐个元素地循环遍历并构造一个新的数组,但是我在一个非常大的数组上进行操作,它花费的时间太长。
Unfortunately, following the example of Python builtins, numpy doesn't make it easy to get the last index, although the first is trivial. 不幸的是,按照Python内置函数的示例,尽管第 一个索引很琐碎,但numpy并不容易获得最后一个索引。 Still, something like
还是这样
def slow(arr, axis, threshold):
return (arr > threshold).cumsum(axis=axis).argmax(axis=axis)
def fast(arr, axis, threshold):
compare = (arr > threshold)
reordered = compare.swapaxes(axis, -1)
flipped = reordered[..., ::-1]
first_above = flipped.argmax(axis=-1)
last_above = flipped.shape[-1] - first_above - 1
are_any_above = compare.any(axis=axis)
# patch the no-matching-element found values
patched = np.where(are_any_above, last_above, 0)
return patched
gives me 给我
In [14]: arr = np.random.random((100,100,30,50))
In [15]: %timeit a = slow(arr, axis=2, threshold=0.75)
1 loop, best of 3: 248 ms per loop
In [16]: %timeit b = fast(arr, axis=2, threshold=0.75)
10 loops, best of 3: 50.9 ms per loop
In [17]: (slow(arr, axis=2, threshold=0.75) == fast(arr, axis=2, threshold=0.75)).all()
Out[17]: True
(There's probably a slicker way to do the flipping but it's the end of day here and my brain is shutting down. :-) (可能有一种比较轻松的方式来进行翻转,但这已经是一天的结束了,我的大脑正在关闭。:-)
Here's a faster approach - 这是一种更快的方法-
def faster(a,n,invalid_specifier):
mask = a>n
idx = a.shape[2] - (mask[:,:,::-1]).argmax(2) - 1
idx[~mask[:,:,-1] & (idx == a.shape[2]-1)] = invalid_specifier
return idx
Runtime test - 运行时测试-
# Using @DSM's benchmarking setup
In [553]: a = np.random.random((100,100,30,50))
...: n = 0.75
...:
In [554]: out1 = faster(a,n,invalid_specifier=0)
...: out2 = fast(a, axis=2, threshold=n) # @DSM's soln
...:
In [555]: np.allclose(out1,out2)
Out[555]: True
In [556]: %timeit fast(a, axis=2, threshold=n) # @DSM's soln
10 loops, best of 3: 64.6 ms per loop
In [557]: %timeit faster(a,n,invalid_specifier=0)
10 loops, best of 3: 43.7 ms per loop
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.