获取每列2d数组中最后一个负值的索引

Question

我正在尝试获取每列数组的最后一个负值的索引（以便在之后对其进行切片）。 1d向量上的一个简单的工作示例是：

import numpy as np

A = np.arange(10) - 5
A[2] = 2
print A # [-5 -4  2 -2 -1  0  1  2  3  4]

idx = np.max(np.where(A <= 0)[0])
print idx # 5

A[:idx] = 0
print A # [0 0 0 0 0 0 1 2 3 4]

现在我想在2D数组的每一列上做同样的事情：

A = np.arange(10) - 5
A[2] = 2
A2 = np.tile(A, 3).reshape((3, 10)) - np.array([0, 2, -1]).reshape((3, 1))
print A2
# [[-5 -4  2 -2 -1  0  1  2  3  4]
#  [-7 -6  0 -4 -3 -2 -1  0  1  2]
#  [-4 -3  3 -1  0  1  2  3  4  5]]

我想获得：

print A2
# [[0 0 0 0 0 0 1 2 3 4]
#  [0 0 0 0 0 0 0 0 1 2]
#  [0 0 0 0 0 1 2 3 4 5]]

但我无法弄清楚如何将max / where语句转换为这个2d数组......

Answer 1

您已经有了很好的答案，但我想使用函数np.maximum.accumulate建议一个更快的变化。 由于您的1D阵列方法使用max / where ，您可能也会发现这种方法非常直观。 （ 编辑：下面添加的更快的Cython实现 ）。

整体方法与其他方法非常相似; 掩码创建时间：

np.maximum.accumulate((A2 < 0)[:, ::-1], axis=1)[:, ::-1]

这行代码执行以下操作：

(A2 < 0)创建一个布尔数组，指示值是否为负数。 索引[:, ::-1]从左到右翻转。
np.maximum.accumulate用于返回每行的累积最大值（即axis=1 ）。 例如[False, True, False]将变为[False, True, True] 。
最终的索引操作[:, ::-1]从左到右翻转这个新的布尔数组。

然后剩下要做的就是使用布尔数组作为掩码将True值设置为零。

借用@Divakar的答案中的时序方法和两个函数，这里是我提出的方法的基准：

# method using np.maximum.accumulate
def accumulate_based(A2):
    A2[np.maximum.accumulate((A2 < 0)[:, ::-1], axis=1)[:, ::-1]] = 0
    return A2

# large sample array
A2 = np.random.randint(-4, 10, size=(100000, 100))
A2c = A2.copy()
A2c2 = A2.copy()

时间是：

In [47]: %timeit broadcasting_based(A2)
10 loops, best of 3: 61.7 ms per loop

In [48]: %timeit cumsum_based(A2c)
10 loops, best of 3: 127 ms per loop

In [49]: %timeit accumulate_based(A2c2) # quickest
10 loops, best of 3: 43.2 ms per loop

因此，对于这种尺寸和形状的阵列，使用np.maximum.accumulate可以比下一个最快的解决方案快30％。

正如@ tom10指出的那样，每个NumPy操作都完整地处理数组，当需要多个操作来获得结果时，这可能是低效的。 只需一次通过阵列的迭代方法可能会更好。

下面是一个用Cython编写的简单函数，它的速度可能是纯NumPy方法的两倍。

可以使用存储器视图进一步加速该功能。

cimport cython
import numpy as np
cimport numpy as np

@cython.boundscheck(False)
@cython.wraparound(False)
@cython.nonecheck(False)
def cython_based(np.ndarray[long, ndim=2, mode="c"] array):
    cdef int rows, cols, i, j, seen_neg
    rows = array.shape[0]
    cols = array.shape[1]
    for i in range(rows):
        seen_neg = 0
        for j in range(cols-1, -1, -1):
            if seen_neg or array[i, j] < 0:
                seen_neg = 1
                array[i, j] = 0
    return array

此函数在每行中向后工作，并在看到负值后开始将值设置为零。

测试工作原理 ：

A2 = np.random.randint(-4, 10, size=(100000, 100))
A2c = A2.copy()

np.array_equal(accumulate_based(A2), cython_based(A2c))
# True

比较功能的性能 ：

In [52]: %timeit accumulate_based(A2)
10 loops, best of 3: 49.8 ms per loop

In [53]: %timeit cython_based(A2c)
100 loops, best of 3: 18.6 ms per loop

Answer 2

假设您要设置每行的所有元素，直到最后一个负元素设置为零（根据示例案例的问题中列出的预期输出），这里可以建议两种方法。

方法＃1

这个基于np.cumsum来生成要设置为零的元素掩码，如下所示 -

# Get boolean mask with TRUEs for each row starting at the first element and 
# ending at the last negative element
mask = (np.cumsum(A2[:,::-1]<0,1)>0)[:,::-1]

# Use mask to set all such al TRUEs to zeros as per the expected output in OP 
A2[mask] = 0

样品运行 -

In [280]: A2 = np.random.randint(-4,10,(6,7)) # Random input 2D array

In [281]: A2
Out[281]: 
array([[-2,  9,  8, -3,  2,  0,  5],
       [-1,  9,  5,  1, -3, -3, -2],
       [ 3, -3,  3,  5,  5,  2,  9],
       [ 4,  6, -1,  6,  1,  2,  2],
       [ 4,  4,  6, -3,  7, -3, -3],
       [ 0,  2, -2, -3,  9,  4,  3]])

In [282]: A2[(np.cumsum(A2[:,::-1]<0,1)>0)[:,::-1]] = 0 # Use mask to set zeros

In [283]: A2
Out[283]: 
array([[0, 0, 0, 0, 2, 0, 5],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 3, 5, 5, 2, 9],
       [0, 0, 0, 6, 1, 2, 2],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 9, 4, 3]])

方法＃2

这个开始于从@tom10's answer中找到最后的负面元素索引并开发成使用broadcasting的掩模查找方法来获得所需的输出，类似于approach #1 。

# Find last negative index for each row
last_idx = A2.shape[1] - 1 - np.argmax(A2[:,::-1]<0, axis=1)

# Find the invalid indices (rows with no negative indices)
invalid_idx = A2[np.arange(A2.shape[0]),last_idx]>=0

# Set the indices for invalid ones to "-1"
last_idx[invalid_idx] = -1

# Boolean mask with each row starting with TRUE as the first element 
# and ending at the last negative element
mask = np.arange(A2.shape[1]) < (last_idx[:,None] + 1)

# Set masked elements to zeros, for the desired output
A2[mask] = 0

运行时测试 -

功能定义：

def broadcasting_based(A2):
    last_idx = A2.shape[1] - 1 - np.argmax(A2[:,::-1]<0, axis=1)
    last_idx[A2[np.arange(A2.shape[0]),last_idx]>=0] = -1
    A2[np.arange(A2.shape[1]) < (last_idx[:,None] + 1)] = 0
    return A2

def cumsum_based(A2):    
    A2[(np.cumsum(A2[:,::-1]<0,1)>0)[:,::-1]] = 0    
    return A2

运行时：

In [379]: A2 = np.random.randint(-4,10,(100000,100))
     ...: A2c = A2.copy()
     ...: 

In [380]: %timeit broadcasting_based(A2)
10 loops, best of 3: 106 ms per loop

In [381]: %timeit cumsum_based(A2c)
1 loops, best of 3: 167 ms per loop

验证结果 -

In [384]: A2 = np.random.randint(-4,10,(100000,100))
     ...: A2c = A2.copy()
     ...: 

In [385]: np.array_equal(broadcasting_based(A2),cumsum_based(A2c))
Out[385]: True

Answer 3

找到第一个通常比找到最后一个更容易，更快，所以在这里我反转数组，然后找到第一个负数（使用OP的A2版本）：

im = A2.shape[1] - 1 - np.argmax(A2[:,::-1]<0, axis=1)

# [4 6 3]      # which are the indices of the last negative in A2

但是，请注意，如果您的大型数组具有许多负数，那么使用非numpy方法实际上可能会更快，因此您可以使搜索短路。 也就是说，numpy将对整个数组进行计算，因此如果你连续有10000个元素，但通常会在前10个元素（反向搜索）中遇到负数，那么纯Python方法可能会更快。

总的来说，迭代行对于后续操作也可能更快。 例如，如果你的下一步是乘法，那么只是将非零的末端的切片相乘可能会更快，或者可能找到最长的非零部分并且只处理截断的数组。

这基本上归结为每行的负数。 如果每行有1000个负数，那么你平均会有非零段，它们是你整行长度的1/1000，所以只需查看结尾就可以获得1000倍的加速度。 问题中提供的简短示例非常适合理解和回答基本问题，但是当您的最终应用程序是一个非常不同的用例时，我不会太认真地对时间测试进行考虑。 特别是因为通过使用迭代节省的分数时间与数组大小成比例地增加（假设恒定比率和负数的随机分布）。

Answer 4

您可以访问各行：

A2[0] == array([-5, -4,  2, -2, -1,  0,  1,  2,  3,  4])

获取每列2d数组中最后一个负值的索引

问题描述

4 个解决方案

解决方案1
12 已采纳 2015-06-27 12:27:29

解决方案2
8 2015-06-27 06:35:40

解决方案3
6 2015-06-27 03:49:23

解决方案4
0 2015-06-24 16:17:07

获取每列2d数组中最后一个负值的索引

问题描述

4 个解决方案

解决方案1 12 已采纳 2015-06-27 12:27:29

解决方案2 8 2015-06-27 06:35:40

解决方案3 6 2015-06-27 03:49:23

解决方案4 0 2015-06-24 16:17:07

解决方案1
12 已采纳 2015-06-27 12:27:29

解决方案2
8 2015-06-27 06:35:40

解决方案3
6 2015-06-27 03:49:23

解决方案4
0 2015-06-24 16:17:07