稀疏矩陣的乘法列元素

Question

我有一個稀疏的csc矩陣，其中包含許多零元素，我想為其計算每一行所有列元素的乘積。

即：

 A = [[1,2,0,0],
      [2,0,3,0]]

應轉換為：

V = [[2,
      6]]

使用numpy密集矩陣，可以通過將所有零值替換為一個值並使用A.prod(1) 。 但是，這不是一個選擇，因為密集矩陣將太大。

有什么方法可以將稀疏矩陣轉換為密集矩陣嗎？

Answer 1

方法＃1：我們可以將稀疏元素的行索引用作ID，並將這些元素的對應值與np.multiply.reduceat以獲得所需的輸出。

因此，一個實現將是-

from scipy import sparse
from scipy.sparse import csc_matrix

r,c,v = sparse.find(a) # a is input sparse matrix
out = np.zeros(a.shape[0],dtype=a.dtype)
unqr, shift_idx = np.unique(r,return_index=1)
out[unqr] = np.multiply.reduceat(v, shift_idx)

樣品運行-

In [89]: # Let's create a sample csc_matrix
    ...: A = np.array([[-1,2,0,0],[0,0,0,0],[2,0,3,0],[4,5,6,0],[1,9,0,2]])
    ...: a = csc_matrix(A)
    ...: 

In [90]: a
Out[90]: 
<5x4 sparse matrix of type '<type 'numpy.int64'>'
    with 10 stored elements in Compressed Sparse Column format>

In [91]: a.toarray()
Out[91]: 
array([[-1,  2,  0,  0],
       [ 0,  0,  0,  0],
       [ 2,  0,  3,  0],
       [ 4,  5,  6,  0],
       [ 1,  9,  0,  2]])

In [92]: out
Out[92]: array([ -2,   0,   6, 120,   0,  18])

方法2：我們正在執行基於bin的乘法。 我們有np.bincount基於bin的求和解決方案。 因此，可以在此處使用的技巧是將數字轉換為對數，執行基於bin的求和，然后再轉換為exponential形式的原始格式（對數exponential反），僅此而已！ 對於負數，我們可以增加一個或多個步驟，但讓我們看看非負數的實現是怎樣的-

r,c,v = sparse.find(a)
out = np.exp(np.bincount(r,np.log(v),minlength = a.shape[0]))
out[np.setdiff1d(np.arange(a.shape[0]),r)] = 0

非負數的樣本運行-

In [118]: a.toarray()
Out[118]: 
array([[1, 2, 0, 0],
       [0, 0, 0, 0],
       [2, 0, 3, 0],
       [4, 5, 6, 0],
       [1, 9, 0, 2]])

In [120]: out  # Using listed code
Out[120]: array([   2.,    0.,    6.,  120.,   18.])

Answer 2

制作樣本：

In [51]: A=np.array([[1,2,0,0],[0,0,0,0],[2,0,3,0]])
In [52]: M=sparse.csr_matrix(A)

以lil格式，每行的值存儲在列表中。

In [56]: Ml=M.tolil()
In [57]: Ml.data
Out[57]: array([[1, 2], [], [2, 3]], dtype=object)

取以下各項的乘積：

In [58]: np.array([np.prod(i) for i in Ml.data])
Out[58]: array([ 2.,  1.,  6.])

在csr格式中，值存儲為：

In [53]: M.data
Out[53]: array([1, 2, 2, 3], dtype=int32)
In [54]: M.indices
Out[54]: array([0, 1, 0, 2], dtype=int32)
In [55]: M.indptr
Out[55]: array([0, 2, 2, 4], dtype=int32)

indptr給indptr值的開始。 csr （和csc ）矩陣上的計算代碼通常執行如下計算（但已編譯）：

In [94]: lst=[]; i=M.indptr[0]
In [95]: for j in M.indptr[1:]:
    ...:     lst.append(np.product(M.data[i:j]))
    ...:     i = j    
In [96]: lst
Out[96]: [2, 1, 6]

使用Diavaker的測試矩陣：

In [137]: M.A
Out[137]: 
array([[-1,  2,  0,  0],
       [ 0,  0,  0,  0],
       [ 2,  0,  3,  0],
       [ 4,  5,  6,  0],
       [ 1,  9,  0,  2]], dtype=int32)

上面的循環產生：

In [138]: foo(M)
Out[138]: [-2, 1, 6, 120, 18]

Divakar的代碼具有unique和reduceat

In [139]: divk(M)
Out[139]: array([ -2,   0,   6, 120,  18], dtype=int32)

（空行的不同值）。

使用indptr進行indptr很簡單：

In [140]: np.multiply.reduceat(M.data,M.indptr[:-1])
Out[140]: array([ -2,   2,   6, 120,  18], dtype=int32)

空的第二行的值需要固定（ indptr值為[2,2，...]， reduceat使用M.data[2] ）。

def wptr(M, empty_val=1):
    res = np.multiply.reduceat(M.data, M.indptr[:-1])
    mask = np.diff(M.indptr)==0
    res[mask] = empty_val
    return res

具有更大的矩陣

Mb=sparse.random(1000,1000,.1,format='csr')

這個wptr比Divaker的版本快30倍。

有關跨稀疏矩陣行計算值的更多討論： Scipy.sparse.csr_matrix：如何獲取前十個值和索引？

Answer 3

您可以使用numpy模塊中的prod（）方法來計算A的每個子列表中所有元素的乘積，同時不考慮值0的元素。

import numpy as np
print [[np.prod([x for x in A[i] if x!=0 ]) for i in range(len(A))]]

稀疏矩陣的乘法列元素

問題描述

3 個解決方案

解決方案1
2 已采納 2016-12-01 19:19:49

解決方案2
1 2016-12-01 20:53:31

解決方案3
0 2016-12-01 19:21:49

稀疏矩陣的乘法列元素

問題描述

3 個解決方案

解決方案1 2 已采納 2016-12-01 19:19:49

解決方案2 1 2016-12-01 20:53:31

解決方案3 0 2016-12-01 19:21:49

解決方案1
2 已采納 2016-12-01 19:19:49

解決方案2
1 2016-12-01 20:53:31

解決方案3
0 2016-12-01 19:21:49