SciPy NumPy和SciKit-learn，创建一个稀疏矩阵

Question

I'm currently trying to classify text. 我目前正在尝试对文本进行分类。 My dataset is too big and as suggested here , I need to use a sparse matrix. 我的数据集太大了，这里建议我需要使用稀疏矩阵。 My question is now, what is the right way to add an element to a sparse matrix? 我现在的问题是，将元素添加到稀疏矩阵的正确方法是什么？ Let's say for example I have a matrix X which is my input . 比方说，我有一个矩阵X，这是我的输入。

X = np.random.randint(2, size=(6, 100))

Now this matrix X looks like an ndarray of an ndarray (or something like that). 现在这个矩阵X看起来像ndarray（或类似的东西）的ndarray。

If I do 如果我做

X2 = csr_matrix(X)

I have the sparse matrix, but how can I add another element to the sparce matrix ? 我有稀疏矩阵，但是如何在sparce矩阵中添加另一个元素？ for example this dense element: [1,0,0,0,1,1,1,0,...,0,1,0] to a sparse vector, how do I add it to the sparse input matrix ? 例如，这个密集元素：[1,0,0,0,1,1,1,0，...，0,1,0]到稀疏向量，如何将其添加到稀疏输入矩阵？

(btw, I'm very new at python, scipy,numpy,scikit ... everything) （顺便说一句，我是python的新手，scipy，numpy，scikit ......一切）

Answer 1

Scikit-learn has a great documentation, with great tutorials that you really should read before trying to invent it yourself. Scikit-learn有一个很棒的文档，有很好的教程，你真的应该在尝试自己发明之前阅读。 This one is the first one to read it explains how to classify text, step-by-step, and this one is a detailed example on text classification using sparse representation. 这是第一个阅读它解释如何分类文本，一步一步，这是一个使用稀疏表示的文本分类的详细示例。

Pay extra attention to the parts where they talk about sparse representations, in this section. 在本节中，要特别注意他们谈论稀疏表示的部分。 In general, if you want to use svm with linear kernel and you large amount of data, LinearSVC (which is based on Liblinear) is better. 一般来说，如果你想使用带有线性内核的svm和大量的数据，LinearSVC（基于Liblinear）会更好。

Regarding your question - I'm sure there are many ways to concatenate two sparse matrices (btw this is what you should look for in google for other ways of doing it), here is one, but you'll have to convert from csr_matrix to coo_matrix which is anther type of sparse matrix: Is there an efficient way of concatenating scipy.sparse matrices? 关于你的问题 - 我确信有很多方法可以连接两个稀疏矩阵（顺便说一句，这是你应该在google中寻找的其他方法），这里有一个，但你必须从csr_matrix转换为coo_matrix是一种稀疏矩阵的coo_matrix一种类型：是否有一种连接scipy.sparse矩阵的有效方法？ . 。

EDIT: When concatenating two matrices (or a matrix and an array which is a 1 dimenesional matrix) the general idea is to concatenate X1.data and X2.data and manipulate their indices and indptr s (or row and col in case of coo_matrix ) to point to the correct places. 编辑：当连接两个矩阵（或矩阵和数组是一个二维矩阵）时，一般的想法是连接X1.data和X2.data并操纵它们的indices和indptr s（或者在coo_matrix情况下为row和col ）指向正确的地方。 Some sparse representations are better for specific operations and more complex for other operations, you should read about csr_matrix and see if this is the best representation. 一些稀疏表示对于特定操作更好，而对于其他操作更复杂，您应该阅读有关csr_matrix并查看这是否是最佳表示。 But I really urge you to start from those tutorials I posted above. 但我真的恳请你从我上面发布的那些教程开始。

SciPy NumPy和SciKit-learn，创建一个稀疏矩阵

问题描述

1 个解决方案

解决方案1
13 已采纳 2012-12-06 11:14:09

SciPy NumPy和SciKit-learn，创建一个稀疏矩阵

问题描述

1 个解决方案

解决方案1 13 已采纳 2012-12-06 11:14:09

解决方案1
13 已采纳 2012-12-06 11:14:09