简体繁体 English

scikit-learn 准备

[英]scikit-learn preperation

原文 2015-06-09 18:44:48 8 1 python/ machine-learning/ scikit-learn

I am trying to use the scikit-learn package for semi supervised classification, I have a file with classes, instances and features but I am not sure how to prepare this file for scikit-learn .我正在尝试使用scikit-learn包进行半监督分类，我有一个包含类、实例和特征的文件，但我不确定如何为scikit-learn准备这个文件。 Could you give some guidelines for file preparation?你能给一些文件准备的指导吗？ The tutorial only provide instructions for uploading prepared data sets from machine learning repositories.本教程仅提供有关从机器学习存储库上传准备好的数据集的说明。 Thank you!谢谢！

1 个解决方案

Scikit-learn directly supports special learning-oriented input formats, notably SVMLight . Scikit-learn 直接支持特殊的面向学习的输入格式，特别是SVMLight 。 But in general, its input is a numpy array (when dense), which can be produced from a diverse range of data sources using other tools from the SciPy stack, notably scipy.io , and more pertinently in the case of a text file with columns, Pandas IO tools .但总的来说，它的输入是一个 numpy 数组（密集时），可以使用 SciPy 堆栈中的其他工具（特别是scipy.io ）从各种数据源生成，并且在文本文件的情况下更相关列， Pandas IO 工具。 You can likely use pandas.read_csv followed by pulling out, and dropping from the feature set, the target class column.您可能可以使用pandas.read_csv然后从特征集中提取和删除目标类列。