[英]Input data type for sklearn SVD fit_transform function
I have already processed document data in CSV file, which I read in pandas DataFrame: 我已经处理了CSV文件中的文档数据,我在pandas DataFrame中读取了该数据:
+----------+------+------------+
| document | term | count |
+----------+------+------------+
| 1 | 126 | 1 |
| 1 | 80 | 1 |
| 1 | 1221 | 2 |
| 2 | 2332 | 1 |
So it consists of document_id, term, and term frequency. 因此它由document_id,术语和术语频率组成。
I don't have original documents, but just this processed data, and I want to apply SVD with sklearn, but I can not figure how to prepare this DataFrame for SVD fit_transform() , which expects: 我没有原始文档,但是只有经过处理的数据,我想将SVD与sklearn一起应用,但是我无法弄清楚如何为SVD fit_transform()准备此DataFrame,它期望:
X : {array-like, sparse matrix}, shape (n_samples, n_features)
X:{类似数组,稀疏矩阵},形状(n_samples,n_features)
You can convert this CSV to libsvm format: 您可以将此CSV转换为libsvm格式:
<label> <index1>:<value1> <index2>:<value2> ...
.
.
.
So, your example data will look like: 因此,您的示例数据将如下所示:
0 80:1 126:1 1221:2
0 2332:1
Then read this file using sklearn.datasets.load_svmlight_file
然后使用
sklearn.datasets.load_svmlight_file
读取此文件
from sklearn.datasets import load_svmlight_file
X, y = load_svmlight_file('your_libsvm_format_file.libsvm')
then, 然后,
from sklearn.decomposition import SVD
svd = SVD()
X_transformed = svd.fit_transform(X)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.