numpy：有效地添加矩阵行

Question

I have a matrix. 我有一个矩阵。

mat = array([
   [ 0,  1,  2,  3],
   [ 4,  5,  6,  7],
   [ 8,  9, 10, 11]
   ])

I'd like to get the sum of the rows at certain indices: eg. 我想得到某些指数的行总和：例如。

ixs = np.array([0,2,0,0,0,1,1])

I know I can compute the answer as: 我知道我可以将答案计算为：

mat[ixs].sum(axis=0)
> array([16, 23, 30, 37])

The problem is ixs may be very long, and I don't want to use all the memory to create the intermediate product mat[ixs], only to reduce it again with the sum. 问题是ixs可能很长，而且我不想使用所有内存来创建中间产品mat [ixs]，只是用总和来再次减少它。

I also know I could simply count up the indices and use multiplication instead. 我也知道我可以简单地计算指数并使用乘法代替。

np.bincount(ixs, minlength=mat.shape[0).dot(mat)
> array([16, 23, 30, 37])

But that will be expensive if my ixs are sparse. 但如果我的ix稀疏，这将是昂贵的。

I know about scipy's sparse matrices, and I suppose I could use them, but I'd prefer a pure numpy solution as sparse matrices are limited in various ways (such as only being 2-d) 我知道scipy的稀疏矩阵，我想我可以使用它们，但我更喜欢纯粹的numpy解决方案，因为稀疏矩阵以各种方式受限（例如只有2-d）

So, is there a pure numpy way to merge the indexing and sum-reduction in this case? 那么，在这种情况下，是否有一种纯粹的numpy方式来合并索引和减少总和？

Conclusions: 结论：

Thanks you Divakar and hpaulj for your very thorough responses. 非常感谢Divakar和hpaulj的回复。 By "sparse" I meant that most of the values in range(w.shape[0]) are not in ixs. 通过“稀疏”我的意思是range(w.shape[0])中的大多数值range(w.shape[0])不在ixs中。 Using that new definition (and with more realisitic data size, I re-ran Divakar tests, with some new funcitona dded : 使用这个新的定义（以及更真实的数据大小，我重新运行了Divakar测试，并使用了一些新的功能：

rng = np.random.RandomState(1234)
mat = rng.randn(1000, 500)
ixs = rng.choice(rng.randint(mat.shape[0], size=mat.shape[0]/10), size=1000)

# Divakar's solutions
In[42]: %timeit org_indexing_app(mat, ixs)
1000 loops, best of 3: 1.82 ms per loop
In[43]: %timeit org_bincount_app(mat, ixs)
The slowest run took 4.07 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 177 µs per loop
In[44]: %timeit indexing_modified_app(mat, ixs)
1000 loops, best of 3: 1.81 ms per loop
In[45]: %timeit bincount_modified_app(mat, ixs)
1000 loops, best of 3: 258 µs per loop
In[46]: %timeit simply_indexing_app(mat, ixs)
1000 loops, best of 3: 1.86 ms per loop
In[47]: %timeit take_app(mat, ixs)
1000 loops, best of 3: 1.82 ms per loop
In[48]: %timeit unq_mask_einsum_app(mat, ixs)
10 loops, best of 3: 58.2 ms per loop 
# hpaulj's solutions
In[53]: %timeit hpauljs_sparse_solution(mat, ixs)
The slowest run took 9.34 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 524 µs per loop
%timeit hpauljs_second_sparse_solution(mat, ixs)
100 loops, best of 3: 9.91 ms per loop
# Sparse version of original bincount solution (see below):
In[60]: %timeit sparse_bincount(mat, ixs)
10000 loops, best of 3: 71.7 µs per loop

The winner in this case is the sparse version of the bincount solution. 在这种情况下，获胜者是bincount解决方案的稀疏版本。

def sparse_bincount(mat, ixs):
    x = np.bincount(ixs)
    nonzeros, = np.nonzero(x)
    x[nonzeros].dot(mat[nonzeros])

Answer 1

Since we are assuming that ixs could be sparsey , we could modify the strategy to get the summations of rows from the zero-th row and rest of the rows separately based on the given row indices. 由于我们假设ixs可能是sparsey ，我们可以修改策略以根据给定的行索引分别从第zero-th行和其余行获得行的求和。 So, we could use the bincount method for the non-zero-th indexed rows summation and add it with the (zero-th row x no. of zeros in ixs ). 因此，我们可以将bincount方法用于non-zero-th索引行求和，并将其添加到(zero-th row x no. of zeros ixs (zero-th row x no. of zeros ）。

Thus, the second approach could be modified, like so - 因此，第二种方法可以修改，如此 -

nzmask = ixs!=0
nzsum = np.bincount(ixs[nzmask]-1, minlength=mat.shape[0]-1).dot(mat[1:])
row0_sum = mat[0]*(len(ixs) - np.count_nonzero(nzmask))
out = nzsum + row0_sum

We could extend this strategy to the first approach as well, like so - 我们也可以将这种策略扩展到第一种方法，就像这样 -

out = mat[0]*(len(ixs) - len(nzidx)) + mat[ixs[nzidx]].sum(axis=0)

If we are working with lots of non-zero indices that are repeated, we could alternatively make use of np.take with focus on performance. 如果我们正在处理大量重复的非零指数，我们可以选择使用np.take ，重点关注性能。 Thus, mat[ixs[nzidx]] could be replaced by np.take(mat,ixs[nzidx],axis=0) and similarly mat[ixs] by np.take(mat,ixs,axis=0) . 因此， mat[ixs[nzidx]]可以被np.take(mat,ixs[nzidx],axis=0)并且类似地由np.take(mat,ixs,axis=0) mat[ixs] 。 With such repeated indices based indexing np.take brings out some noticeable speedup as compared to simply indexing. 使用这种基于索引的重复索引np.take与简单索引相比带来了一些明显的加速。

Finally, we could use np.einsum to perform these row ID based selection and summing, like so - 最后，我们可以使用np.einsum来执行这些基于行ID的选择和求和，就像这样 -

nzmask = ixs!=0
unq,tags = np.unique(ixs[nzmask],return_inverse=1)
nzsum = np.einsum('ji,jk->k',np.arange(len(unq))[:,None] == tags,mat[unq])
out = mat[0]*(len(ixs) - np.count_nonzero(nzmask)) + nzsum

Benchmarking 标杆

Let's list out all the five approaches posted thus far in this post and also include the two approaches posted in the question for some runtime testing as functions - 让我们列出本文迄今为止发布的所有五种方法，并且还包括在问题中发布的两种方法，用于某些运行时测试作为函数 -

def org_indexing_app(mat,ixs):
    return mat[ixs].sum(axis=0)

def org_bincount_app(mat,ixs):
    return np.bincount(ixs, minlength=mat.shape[0]).dot(mat)

def indexing_modified_app(mat,ixs):
    return np.take(mat,ixs,axis=0).sum(axis=0)

def bincount_modified_app(mat,ixs):
    nzmask = ixs!=0
    nzsum = np.bincount(ixs[nzmask]-1, minlength=mat.shape[0]-1).dot(mat[1:])
    row0_sum = mat[0]*(len(ixs) - np.count_nonzero(nzmask))
    return nzsum + row0_sum

def simply_indexing_app(mat,ixs):
    nzmask = ixs!=0
    nzsum = mat[ixs[nzmask]].sum(axis=0)
    return mat[0]*(len(ixs) - np.count_nonzero(nzmask)) + nzsum

def take_app(mat,ixs):
    nzmask = ixs!=0
    nzsum = np.take(mat,ixs[nzmask],axis=0).sum(axis=0)
    return mat[0]*(len(ixs) - np.count_nonzero(nzmask)) + nzsum

def unq_mask_einsum_app(mat,ixs):
    nzmask = ixs!=0
    unq,tags = np.unique(ixs[nzmask],return_inverse=1)
    nzsum = np.einsum('ji,jk->k',np.arange(len(unq))[:,None] == tags,mat[unq])
    return mat[0]*(len(ixs) - np.count_nonzero(nzmask)) + nzsum

Timings 计时

Case #1 ( ixs is 95% sparsey) : 案例＃1（ ixs是95％sparsey）：

In [301]: # Setup input
     ...: mat = np.random.rand(20,4)
     ...: ixs = np.random.randint(0,10,(100000))
     ...: ixs[np.random.rand(ixs.size)<0.95] = 0 # Make it approx 95% sparsey
     ...: 

In [302]: # Timings
     ...: %timeit org_indexing_app(mat,ixs)
     ...: %timeit org_bincount_app(mat,ixs)
     ...: %timeit indexing_modified_app(mat,ixs)
     ...: %timeit bincount_modified_app(mat,ixs)
     ...: %timeit simply_indexing_app(mat,ixs)
     ...: %timeit take_app(mat,ixs)
     ...: %timeit unq_mask_einsum_app(mat,ixs)
     ...: 
100 loops, best of 3: 4.89 ms per loop
1000 loops, best of 3: 428 µs per loop
100 loops, best of 3: 3.29 ms per loop
1000 loops, best of 3: 329 µs per loop
1000 loops, best of 3: 537 µs per loop
1000 loops, best of 3: 462 µs per loop
1000 loops, best of 3: 1.07 ms per loop

Case #2 ( ixs is 98% sparsey) : 案例＃2（ ixs是98％sparsey）：

In [303]: # Setup input
     ...: mat = np.random.rand(20,4)
     ...: ixs = np.random.randint(0,10,(100000))
     ...: ixs[np.random.rand(ixs.size)<0.98] = 0 # Make it approx 98% sparsey
     ...: 

In [304]: # Timings
     ...: %timeit org_indexing_app(mat,ixs)
     ...: %timeit org_bincount_app(mat,ixs)
     ...: %timeit indexing_modified_app(mat,ixs)
     ...: %timeit bincount_modified_app(mat,ixs)
     ...: %timeit simply_indexing_app(mat,ixs)
     ...: %timeit take_app(mat,ixs)
     ...: %timeit unq_mask_einsum_app(mat,ixs)
     ...: 
100 loops, best of 3: 4.86 ms per loop
1000 loops, best of 3: 438 µs per loop
100 loops, best of 3: 3.5 ms per loop
1000 loops, best of 3: 260 µs per loop
1000 loops, best of 3: 318 µs per loop
1000 loops, best of 3: 288 µs per loop
1000 loops, best of 3: 694 µs per loop

Answer 2

An alternative to bincount is add.at : bincount的替代bincount是add.at ：

In [193]: mat
Out[193]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
In [194]: ixs
Out[194]: array([0, 2, 0, 0, 0, 1, 1])

In [195]: J = np.zeros(mat.shape[0],int)
In [196]: np.add.at(J, ixs, 1)
In [197]: J
Out[197]: array([4, 2, 1])

In [198]: np.dot(J, mat)
Out[198]: array([16, 23, 30, 37])

By the sparsity, you mean, I assume, that ixs might not include all the rows, for example, ixs without the 0s: 通过稀疏性，我的意思是，我认为， ixs可能不包括所有行，例如，没有0的ixs ：

In [199]: ixs = np.array([2,1,1])
In [200]: J=np.zeros(mat.shape[0],int)
In [201]: np.add.at(J, ixs, 1)
In [202]: J
Out[202]: array([0, 2, 1])
In [203]: np.dot(J, mat)
Out[203]: array([16, 19, 22, 25])

J still has the mat.shape[0] shape. J仍然具有mat.shape[0]形状。 But the add.at should scale as the length of ixs . 但add.at应该缩放为ixs的长度。

A sparse solution would look something like: 稀疏解决方案看起来像：

Make a sparse matrix from ixs that looks like: 从ixs一个稀疏矩阵，如下所示：

In [204]: I
Out[204]: 
array([[1, 0, 1, 1, 1, 0, 0],
       [0, 0, 0, 0, 0, 1, 1],
       [0, 1, 0, 0, 0, 0, 0]])

sum the rows; 行总和; sparse does this with matrix multiplication like: 稀疏用矩阵乘法做到这一点，如：

In [205]: np.dot(I, np.ones((7,),int))
Out[205]: array([4, 2, 1])

then do our dot: 然后做我们的点：

In [206]: np.dot(np.dot(I, np.ones((7,),int)), mat)
Out[206]: array([16, 23, 30, 37])

Or in sparse code: 或者在稀疏代码中：

In [225]: J = sparse.coo_matrix((np.ones_like(ixs,int),(np.arange(ixs.shape[0]), ixs)))
In [226]: J.A
Out[226]: 
array([[1, 0, 0],
       [0, 0, 1],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [0, 1, 0],
       [0, 1, 0]])
In [227]: J.sum(axis=0)*mat
Out[227]: matrix([[16, 23, 30, 37]])

sparse , when converting from coo to csr sums duplicates. sparse ，当从coo转换为csr sums重复时。 I can take advantage that with 我可以利用它

In [229]: J = sparse.coo_matrix((np.ones_like(ixs,int), (np.zeros_like(ixs,int), ixs)))
In [230]: J
Out[230]: 
<1x3 sparse matrix of type '<class 'numpy.int32'>'
    with 7 stored elements in COOrdinate format>
In [231]: J.A
Out[231]: array([[4, 2, 1]])
In [232]: J*mat
Out[232]: array([[16, 23, 30, 37]], dtype=int32)

Answer 3

After much number crunching (see Conclusions of original Question), the best-performing answer, when the inputs are defined as follows: 经过大量的数字运算（参见原始问题的结论），当输入定义如下时，表现最佳的答案：

rng = np.random.RandomState(1234)
mat = rng.randn(1000, 500)
ixs = rng.choice(rng.randint(mat.shape[0], size=mat.shape[0]/10), size=1000)

Seems to be: 似乎是：

def sparse_bincount(mat, ixs):
    x = np.bincount(ixs)
    nonzeros, = np.nonzero(x)
    x[nonzeros].dot(mat[nonzeros])

numpy：有效地添加矩阵行

问题描述

Conclusions: 结论：

3 个解决方案

解决方案1
2 2016-10-10 18:46:59

Benchmarking 标杆

解决方案2
2 2016-10-10 20:05:47

解决方案3
0 2016-10-11 10:48:04

numpy：有效地添加矩阵行

问题描述

Conclusions: 结论：

3 个解决方案

解决方案1 2 2016-10-10 18:46:59

Benchmarking 标杆

解决方案2 2 2016-10-10 20:05:47

解决方案3 0 2016-10-11 10:48:04

解决方案1
2 2016-10-10 18:46:59

解决方案2
2 2016-10-10 20:05:47

解决方案3
0 2016-10-11 10:48:04