對scipy.sparse.csr_matrix中的行求和

Question

我的csr_matrix很大，我想添加行並獲得具有相同列數但行數減少的新csr_matrix。 （上下文：該矩陣是從sklearn CountVectorizer獲得的文檔項矩陣，我希望能夠根據與這些文檔相關的代碼快速組合文檔）

舉一個最小的例子，這是我的矩陣：

import numpy as np
from scipy.sparse import csr_matrix
from scipy.sparse import vstack

row = np.array([0, 4, 1, 3, 2])
col = np.array([0, 2, 2, 0, 1])
dat = np.array([1, 2, 3, 4, 5])
A = csr_matrix((dat, (row, col)), shape=(5, 5))
print A.toarray()

[[1 0 0 0 0]
 [0 0 3 0 0]
 [0 5 0 0 0]
 [4 0 0 0 0]
 [0 0 2 0 0]]

不用說我想要一個新的矩陣B ，其中行（1，4）和（2，3，5）通過求和來合並，看起來像這樣：

[[5 0 0 0 0]
 [0 5 5 0 0]]

並且應該再次采用稀疏格式（因為我正在使用的實際數據量很大）。 我試圖對矩陣的切片求和，然后將其堆疊：

idx1 = [1, 4]
idx2 = [2, 3, 5]
A_sub1 = A[idx1, :].sum(axis=1)
A_sub2 = A[idx2, :].sum(axis=1)
B = vstack((A_sub1, A_sub2))

但是，這僅給切片中的非零列提供了求和值，因此我無法將其與其他切片結合使用，因為求和切片中的列數不同。

我覺得必須有一個簡單的方法來做到這一點。 但是我在網上或文檔中都找不到對此的任何討論。 我想念什么？

謝謝您的幫助

Answer 1

請注意，您可以通過仔細構造另一個矩陣來做到這一點。 這是對密集矩陣的工作方式：

>>> S = np.array([[1, 0, 0, 1, 0,], [0, 1, 1, 0, 1]])
>>> np.dot(S, A.toarray())
array([[5, 0, 0, 0, 0],
       [0, 5, 5, 0, 0]])
>>>

稀疏版本只是稍微復雜一點。 關於應將哪些行加在一起的信息在row編碼：

col = range(5)
row = [0, 1, 1, 0, 1]
dat = [1, 1, 1, 1, 1]
S = csr_matrix((dat, (row, col)), shape=(2, 5))
result = S * A
# check that the result is another sparse matrix
print type(result)
# check that the values are the ones we want
print result.toarray()

輸出：

<class 'scipy.sparse.csr.csr_matrix'>
[[5 0 0 0 0]
 [0 5 5 0 0]]

您可以通過在更高的價值在處理你的輸出更多的行row和延長的形狀S相應。

Answer 2

索引應為：

idx1 = [0, 3]       # rows 1 and 4
idx2 = [1, 2, 4]    # rows 2,3 and 5

然后，您需要將A_sub1和A_sub2保持為稀疏格式，並使用axis=0 ：

A_sub1 = csr_matrix(A[idx1, :].sum(axis=0))
A_sub2 = csr_matrix(A[idx2, :].sum(axis=0))
B = vstack((A_sub1, A_sub2))
B.toarray()
array([[5, 0, 0, 0, 0],
       [0, 5, 5, 0, 0]])

注意，我認為A[idx, :].sum(axis=0)操作涉及到稀疏矩陣的轉換-因此@Mr_E的答案可能更好。

可替代地，它的工作原理，當您使用axis=0和np.vstack （相對於scipy.sparse.vstack ）：

A_sub1 = A[idx1, :].sum(axis=0)
A_sub2 = A[idx2, :].sum(axis=0)
np.vstack((A_sub1, A_sub2))

贈送：

matrix([[5, 0, 0, 0, 0],
        [0, 5, 5, 0, 0]])

對scipy.sparse.csr_matrix中的行求和

問題描述

2 個解決方案

解決方案1
4 已采納 2015-04-14 14:46:23

解決方案2
1 2015-04-14 14:45:44

對scipy.sparse.csr_matrix中的行求和

問題描述

2 個解決方案

解決方案1 4 已采納 2015-04-14 14:46:23

解決方案2 1 2015-04-14 14:45:44

解決方案1
4 已采納 2015-04-14 14:46:23

解決方案2
1 2015-04-14 14:45:44