简体   繁体   English

稀疏矩阵的乘法列元素

[英]Multiplying column elements of sparse Matrix

I have a sparse csc matrix with many zero elements for which I would like to compute the product of all column elements for each row. 我有一个稀疏的csc矩阵,其中包含许多零元素,我想为其计算每一行所有列元素的乘积。

ie: 即:

 A = [[1,2,0,0],
      [2,0,3,0]]

should be converted to: 应转换为:

V = [[2,
      6]]

Using a numpy dense matrix this can be accomplished by replacing all zero values with one values and using A.prod(1) . 使用numpy密集矩阵,可以通过将所有零值替换为一个值并使用A.prod(1) This is however not a option since the dense matrix would be too large. 但是,这不是一个选择,因为密集矩阵将太大。

Is there any way to accomplish this without converting the sparse matrix into a dense one? 有什么方法可以将稀疏矩阵转换为密集矩阵吗?

Approach #1: We can use the row indices of the sparse elements as IDs and perform multiplication of the corresponding values of those elements with np.multiply.reduceat to get the desired output. 方法#1:我们可以将稀疏元素的行索引用作ID,并将这些元素的对应值与np.multiply.reduceat以获得所需的输出。

Thus, an implementation would be - 因此,一个实现将是-

from scipy import sparse
from scipy.sparse import csc_matrix

r,c,v = sparse.find(a) # a is input sparse matrix
out = np.zeros(a.shape[0],dtype=a.dtype)
unqr, shift_idx = np.unique(r,return_index=1)
out[unqr] = np.multiply.reduceat(v, shift_idx)

Sample run - 样品运行-

In [89]: # Let's create a sample csc_matrix
    ...: A = np.array([[-1,2,0,0],[0,0,0,0],[2,0,3,0],[4,5,6,0],[1,9,0,2]])
    ...: a = csc_matrix(A)
    ...: 

In [90]: a
Out[90]: 
<5x4 sparse matrix of type '<type 'numpy.int64'>'
    with 10 stored elements in Compressed Sparse Column format>

In [91]: a.toarray()
Out[91]: 
array([[-1,  2,  0,  0],
       [ 0,  0,  0,  0],
       [ 2,  0,  3,  0],
       [ 4,  5,  6,  0],
       [ 1,  9,  0,  2]])

In [92]: out
Out[92]: array([ -2,   0,   6, 120,   0,  18])

Approach #2: We are performing bin-based multiplication. 方法2:我们正在执行基于bin的乘法。 We have bin-based summing solution with np.bincount . 我们有np.bincount基于bin的求和解决方案。 So, a trick that could be use here would be converting the numbers to logarithmic numbers, perform bin-based summing and then convert back to original format with exponential (reverse of log) and that's it! 因此,可以在此处使用的技巧是将数字转换为对数,执行基于bin的求和,然后再转换为exponential形式的原始格式(对数exponential反),仅此而已! For negative numbers, we might to add a step or more, but let's see what the implementation be like for non-negative numbers - 对于负数,我们可以增加一个或多个步骤,但让我们看看非负数的实现是怎样的-

r,c,v = sparse.find(a)
out = np.exp(np.bincount(r,np.log(v),minlength = a.shape[0]))
out[np.setdiff1d(np.arange(a.shape[0]),r)] = 0

A sample run with non-negative numbers - 非负数的样本运行-

In [118]: a.toarray()
Out[118]: 
array([[1, 2, 0, 0],
       [0, 0, 0, 0],
       [2, 0, 3, 0],
       [4, 5, 6, 0],
       [1, 9, 0, 2]])

In [120]: out  # Using listed code
Out[120]: array([   2.,    0.,    6.,  120.,   18.])

Make a sample: 制作样本:

In [51]: A=np.array([[1,2,0,0],[0,0,0,0],[2,0,3,0]])
In [52]: M=sparse.csr_matrix(A)

In lil format, values for each row are stored in a list. lil格式,每行的值存储在列表中。

In [56]: Ml=M.tolil()
In [57]: Ml.data
Out[57]: array([[1, 2], [], [2, 3]], dtype=object)

Take the product of each of those: 取以下各项的乘积:

In [58]: np.array([np.prod(i) for i in Ml.data])
Out[58]: array([ 2.,  1.,  6.])

In csr format values are stored as: csr格式中,值存储为:

In [53]: M.data
Out[53]: array([1, 2, 2, 3], dtype=int32)
In [54]: M.indices
Out[54]: array([0, 1, 0, 2], dtype=int32)
In [55]: M.indptr
Out[55]: array([0, 2, 2, 4], dtype=int32)

indptr gives the start of the row values. indptrindptr值的开始。 Calculation code on csr (and csc ) matrices routinely perform calculations like this (but compiled): csr (和csc )矩阵上的计算代码通常执行如下计算(但已编译):

In [94]: lst=[]; i=M.indptr[0]
In [95]: for j in M.indptr[1:]:
    ...:     lst.append(np.product(M.data[i:j]))
    ...:     i = j    
In [96]: lst
Out[96]: [2, 1, 6]

With Diavaker's test matrix: 使用Diavaker的测试矩阵:

In [137]: M.A
Out[137]: 
array([[-1,  2,  0,  0],
       [ 0,  0,  0,  0],
       [ 2,  0,  3,  0],
       [ 4,  5,  6,  0],
       [ 1,  9,  0,  2]], dtype=int32)

the above loop produces: 上面的循环产生:

In [138]: foo(M)
Out[138]: [-2, 1, 6, 120, 18]

Divakar's code with unique and reduceat Divakar的代码具有uniquereduceat

In [139]: divk(M)
Out[139]: array([ -2,   0,   6, 120,  18], dtype=int32)

(different values of the empty row). (空行的不同值)。

Reduceat with indptr is simply: 使用indptr进行indptr很简单:

In [140]: np.multiply.reduceat(M.data,M.indptr[:-1])
Out[140]: array([ -2,   2,   6, 120,  18], dtype=int32)

The value for the empty 2nd line needs to be fixed (with indptr values of [2,2,...], reduceat uses M.data[2] ). 空的第二行的值需要固定( indptr值为[2,2,...], reduceat使用M.data[2] )。

def wptr(M, empty_val=1):
    res = np.multiply.reduceat(M.data, M.indptr[:-1])
    mask = np.diff(M.indptr)==0
    res[mask] = empty_val
    return res

With a larger matrix 具有更大的矩阵

Mb=sparse.random(1000,1000,.1,format='csr')

this wptr is about 30x faster than Divaker's version. 这个wptr比Divaker的版本快30倍。

More discussion on calculating values across rows of a sparse matrix: Scipy.sparse.csr_matrix: How to get top ten values and indices? 有关跨稀疏矩阵行计算值的更多讨论: Scipy.sparse.csr_matrix:如何获取前十个值和索引?

You can use the prod() method from the numpy module to calculate the product of all elements in each sublist of A while excluding elements of value 0 from being taken into account. 您可以使用numpy模块中的prod()方法来计算A的每个子列表中所有元素的乘积,同时不考虑值0的元素。

import numpy as np
print [[np.prod([x for x in A[i] if x!=0 ]) for i in range(len(A))]]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM