连接 scipy 中稀疏矩阵的特定行

Question

I have a large sparse matrix (using scipy.sparse) with I rows and U columns, U is much greater than I. I have a list of U random numbers in the range of 0:I.我有一个带有 I 行和 U 列的大型稀疏矩阵（使用 scipy.sparse），U 比 I 大得多。我有一个范围为 0:I 的 U 随机数列表。 I would like to create a new sparse matrix which will be a U * U sparse matrix, the row for user u will hold all the U values in row i of the original sparse matrix.我想创建一个新的稀疏矩阵，它将是一个 U * U 稀疏矩阵，用户 u 的行将保存原始稀疏矩阵的第 i 行中的所有 U 值。 For example, if the original matrix is a 3*5 matrix:例如，如果原始矩阵是 3*5 矩阵：

0,0,2,1,0
0,0,3,4,1
1,1,0,2,0

and the list of random numbers is [0,0,2,1,2]随机数列表为 [0,0,2,1,2]

The resulting matrix should be:结果矩阵应该是：

0,0,2,1,0
0,0,2,1,0
1,1,0,2,0
0,0,3,4,1
1,1,0,2,0

I am using this code now, which is very very slow:我现在正在使用这段代码，非常慢：

for u in range(U):
    i= random_indices[u]
    if u == 0:
        output_sparse_matrix = original_sparse_matrix[i, :]
    else:
        output_sparse_matrix = vstack((output_sparse_matrix,
                                       original_sparse_matrix[i, :]))

Any suggestions on how this can be done quicker?关于如何更快地完成此操作的任何建议？

Update I used Jérôme Richard's suggestion, but inside a loop - since I got an out of memory error.更新我使用了 Jérôme Richard 的建议，但在一个循环中 - 因为我得到了 memory 错误。 This is the solution that worked:这是有效的解决方案：

bins = np.array_split(random_indices, 10)
output_sparse_matrix = original_sparse_matrix[bins[0]]

for bin in bins[1:10]:
   output_sparse_matrix = vstack((output_sparse_matrix ,original_sparse_matrix[bin]))

Answer 1

vstack create a new matrix for every iteration. vstack为每次迭代创建一个新矩阵。 This is the main source of slowdown since the complexity of the algorithm is O(U^3) .这是减速的主要来源，因为算法的复杂度为O(U^3) 。 You can just append the new lines in a Python list and then vstack the list of lines.您可以在 Python 列表中仅 append 新行，然后vstack行列表。 Alternatively, a better approach is just to use the following Numpy expression:或者，更好的方法是使用以下 Numpy 表达式：

original_sparse_matrix[random_indices, :]

Answer 2

This may no be faster but you can try using fancy indexing:这可能不会更快，但您可以尝试使用花哨的索引：

output_sparse_matrix = input_sparse_matrix[random_indices]

provided random_indices is a list the above should give the desired result.提供 random_indices 是上面应该给出所需结果的列表。

Applying this to your original example:将此应用于您的原始示例：

from scipy.sparse import csr_matrix

a = csr_matrix([[0,0,2,1,0],
[0,0,3,4,1],
[1,1,0,2,0]])

indices =  [0,0,2,1,2]


output_matrix = a[indices]

print(output_matrix.todense())

连接 scipy 中稀疏矩阵的特定行

问题描述

2 个解决方案

解决方案1
1 已采纳 2021-04-10 16:44:39

解决方案2
1 2021-04-10 17:00:47

连接 scipy 中稀疏矩阵的特定行

问题描述

2 个解决方案

解决方案1 1 已采纳 2021-04-10 16:44:39

解决方案2 1 2021-04-10 17:00:47

解决方案1
1 已采纳 2021-04-10 16:44:39

解决方案2
1 2021-04-10 17:00:47