Python：由於重復索引而在稀疏矩陣（lil_matrix）中累積插入值

Question

我的情況如下：

我有一個結果數組，例如S = np.array([2,3,10,-1,12,1,2,4,4]) ，我想將其插入scipy.sparse.lil的最后一行scipy.sparse.lil _matrix M根據具有可能重復的元素（沒有特定模式）的列索引數組，例如： j = np.array([3,4,5,14,15,16,3,4,5]) 。

當重復列索引時，應將它們在S中的對應值之和插入矩陣M 。 因此，在上面的示例中，結果[4,7,14]應該放在M的最后一行的列[3,4,5]中。 換句話說，我想實現以下目標：

M[-1,j] = np.array([2+2,3+4,10+4,-1,12,1]) 。

計算速度對於我的程序非常重要，因此我應該避免使用循環。 期待您的聰明解決方案！ 謝謝！

Answer 1

您可以使用defaultdict將M列索引映射到它們的值，並使用map函數更新此defaultdict，如下所示：

from collections import defaultdict

d = defaultdict(int) #Use your array type here
def f(j, s):
    d[j] += s
map(f, j, S)
M[-1, d.keys()] = d.values() #keys and values are always in the same order

如果不想None用地創建None列表，可以使用filter代替map ：

d = defaultdict(int) #Use your array type here
def g(e):
    d[e[1]] += S[e[0]]
filter(g, enumerate(j))
M[-1, d.keys()] = d.values() #keys and values are always in the same

Answer 2

這類求和是sparse矩陣的正常行為，尤其是在csr格式中。

定義3個輸入數組：

In [408]: S = np.array([2,3,10,-1,12,1,2,4,4])
In [409]: j=np.array([3,4,5,14,15,16,3,4,5])
In [410]: i=np.ones(S.shape,int)

coo格式按原樣采用這3個數組

In [411]: c0=sparse.coo_matrix((S,(i,j)))
In [412]: c0.data
Out[412]: array([ 2,  3, 10, -1, 12,  1,  2,  4,  4])

但是，當轉換為csr格式時，它將對重復的索引求和：

In [413]: c1=c0.tocsr()
In [414]: c1.data
Out[414]: array([ 4,  7, 14, -1, 12,  1], dtype=int32)
In [415]: c1.A
Out[415]: 
array([[ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  4,  7, 14,  0,  0,  0,  0,  0,  0,  0,  0, -1, 12,  1]], dtype=int32)

當將coo轉換為稠密或數組c0.A時，也會完成該求和。

當轉換為lil ：

In [419]: cl=c0.tolil()
In [420]: cl.data
Out[420]: array([[], [4, 7, 14, -1, 12, 1]], dtype=object)
In [421]: cl.rows
Out[421]: array([[], [3, 4, 5, 14, 15, 16]], dtype=object)

lil_matrix不直接接受(data,(i,j))輸入，因此如果這是您的目標，則必須通過coo 。

http://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.sparse.coo_matrix.html

默認情況下，轉換為CSR或CSC格式時，重復的（i，j）條目將加在一起。 這有助於有限元矩陣等的有效構造。 （請參見示例）

要將其插入現有lil使用中間的csr ：

In [443]: L=sparse.lil_matrix((3,17),dtype=S.dtype)
In [444]: L[-1,:]=sparse.csr_matrix((S,(np.zeros(S.shape),j)))
In [445]: L.A
Out[445]: 
array([[ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  4,  7, 14,  0,  0,  0,  0,  0,  0,  0,  0, -1, 12,  1]])

該語句比使用csr_matrix語句要快；

L[-1,:]=sparse.coo_matrix((S,(np.zeros(S.shape),j)))

如果您真的擔心速度，請檢查L.__setitem__ 。 看起來好像通常將稀疏矩陣轉換為數組

L[-1,:]=sparse.coo_matrix((S,(np.zeros(S.shape),j))).A

需要相同的時間。 使用這樣的小測試用例，創建中間矩陣的開銷可能會浪費任何時間來添加這些重復的索引。

通常，無論是否進行此求和，向現有的稀疏矩陣插入或附加值都很慢。 在可能的情況下，最好首先為整個矩陣創建data ， i和j數組，然后創建稀疏矩陣。

Python：由於重復索引而在稀疏矩陣（lil_matrix）中累積插入值

問題描述

2 個解決方案

解決方案1
0 2015-09-16 21:17:25

解決方案2
0 已采納 2015-09-16 21:27:14

Python：由於重復索引而在稀疏矩陣（lil_matrix）中累積插入值

問題描述

2 個解決方案

解決方案1 0 2015-09-16 21:17:25

解決方案2 0 已采納 2015-09-16 21:27:14

解決方案1
0 2015-09-16 21:17:25

解決方案2
0 已采納 2015-09-16 21:27:14