（Python Scipy）如何展平一个csr_matrix并将其附加到另一个csr_matrix？

Question

I am representing each XML document as a feature matrix in a csr_matrix format. 我将每个XML文档表示为csr_matrix格式的功能矩阵。 Now that I have around 3000 XML documents, I got a list of csr_matrices. 现在，我已经拥有约3000个XML文档，我得到了csr_matrices的列表。 I want to flatten each of these matrices to become feature vectors, then I want to combine all of these feature vectors to form one csr_matrix representing all the XML documents as one, where each row is a document and each column is a feature. 我想将这些矩阵中的每一个展平以成为特征向量，然后我想组合所有这些特征向量以形成一个代表所有XML文档的csr_matrix，其中每一行是一个文档，每一列是一个特征。

One way to achieve this is through this code 实现此目的的一种方法是通过此代码

X= csr_matrix([a.toarray().ravel().tolist() for a in ls])

where ls is the list of csr_matrices, however, this is highly inefficient, as with 3000 documents, this simply crashes! 其中ls是csr_matrices的列表，但是效率很低，因为使用3000个文档，这简直就是崩溃！

In other words, my question is, how to flatten each csr_matrix in that list 'ls' without having to turn it into an array, and how to append the flattened csr_matrices into another csr_matrix. 换句话说，我的问题是，如何将列表“ ls”中的每个csr_matrix扁平化而不必将其转换成数组，以及如何将扁平化的csr_matrices附加到另一个csr_matrix中。

Please note that I am using python with Scipy 请注意，我在Scipy中使用python

Thanks in advance! 提前致谢！

Answer 1

Why you use csr_matrix for each XML, maybe it's better to use lil , lil_matrix support reshape method, here is an example: 为什么对每种XML使用csr_matrix ，也许最好使用lil ， lil_matrix支持重塑方法，下面是一个示例：

N, M, K = 100, 200, 300
matrixs = [sparse.rand(N, M, format="csr") for i in xrange(K)]
matrixs2 = [m.tolil().reshape((1, N*M)) for m in matrixs]
m1 = sparse.vstack(matrixs2).tocsr()

# test with dense array
#m2 = np.vstack([m.toarray().reshape(-1) for m in matrixs])
#np.allclose(m1.toarray(), m2)

（Python Scipy）如何展平一个csr_matrix并将其附加到另一个csr_matrix？

问题描述

1 个解决方案

解决方案1
4 已采纳 2013-03-22 07:53:24

（Python Scipy）如何展平一个csr_matrix并将其附加到另一个csr_matrix？

问题描述

1 个解决方案

解决方案1 4 已采纳 2013-03-22 07:53:24

解决方案1
4 已采纳 2013-03-22 07:53:24