sklearn SVD fit_transform函数的输入数据类型

Question

I have already processed document data in CSV file, which I read in pandas DataFrame: 我已经处理了CSV文件中的文档数据，我在pandas DataFrame中读取了该数据：

+----------+------+------------+
| document | term | count      |
+----------+------+------------+
| 1        | 126  | 1          |
| 1        | 80   | 1          |
| 1        | 1221 | 2          |
| 2        | 2332 | 1          |

So it consists of document_id, term, and term frequency. 因此它由document_id，术语和术语频率组成。

I don't have original documents, but just this processed data, and I want to apply SVD with sklearn, but I can not figure how to prepare this DataFrame for SVD fit_transform() , which expects: 我没有原始文档，但是只有经过处理的数据，我想将SVD与sklearn一起应用，但是我无法弄清楚如何为SVD fit_transform（）准备此DataFrame，它期望：

X : {array-like, sparse matrix}, shape (n_samples, n_features) X：{类似数组，稀疏矩阵}，形状（n_samples，n_features）

Answer 1

You can convert this CSV to libsvm format: 您可以将此CSV转换为libsvm格式：

<label> <index1>:<value1> <index2>:<value2> ...
.
.
.

So, your example data will look like: 因此，您的示例数据将如下所示：

0 80:1 126:1 1221:2
0 2332:1

Then read this file using sklearn.datasets.load_svmlight_file 然后使用sklearn.datasets.load_svmlight_file读取此文件

from sklearn.datasets import load_svmlight_file
X, y = load_svmlight_file('your_libsvm_format_file.libsvm')

then, 然后，

from sklearn.decomposition import SVD
svd = SVD()
X_transformed = svd.fit_transform(X)

sklearn SVD fit_transform函数的输入数据类型

问题描述

1 个解决方案

解决方案1
1 已采纳 2016-09-08 14:57:59

sklearn SVD fit_transform函数的输入数据类型

问题描述

1 个解决方案

解决方案1 1 已采纳 2016-09-08 14:57:59

解决方案1
1 已采纳 2016-09-08 14:57:59