简体   繁体   English

sklearn SVD fit_transform函数的输入数据类型

[英]Input data type for sklearn SVD fit_transform function

I have already processed document data in CSV file, which I read in pandas DataFrame: 我已经处理了CSV文件中的文档数据,我在pandas DataFrame中读取了该数据:

+----------+------+------------+
| document | term | count      |
+----------+------+------------+
| 1        | 126  | 1          |
| 1        | 80   | 1          |
| 1        | 1221 | 2          |
| 2        | 2332 | 1          |

So it consists of document_id, term, and term frequency. 因此它由document_id,术语和术语频率组成。

I don't have original documents, but just this processed data, and I want to apply SVD with sklearn, but I can not figure how to prepare this DataFrame for SVD fit_transform() , which expects: 我没有原始文档,但是只有经过处理的数据,我想将SVD与sklearn一起应用,但是我无法弄清楚如何为SVD fit_transform()准备此DataFrame,它期望:

X : {array-like, sparse matrix}, shape (n_samples, n_features) X:{类似数组,稀疏矩阵},形状(n_samples,n_features)

You can convert this CSV to libsvm format: 您可以将此CSV转换为libsvm格式:

<label> <index1>:<value1> <index2>:<value2> ...
.
.
.

So, your example data will look like: 因此,您的示例数据将如下所示:

0 80:1 126:1 1221:2
0 2332:1

Then read this file using sklearn.datasets.load_svmlight_file 然后使用sklearn.datasets.load_svmlight_file读取此文件

from sklearn.datasets import load_svmlight_file
X, y = load_svmlight_file('your_libsvm_format_file.libsvm')

then, 然后,

from sklearn.decomposition import SVD
svd = SVD()
X_transformed = svd.fit_transform(X)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM