简体   繁体   English

从加载的arff文件创建训练和测试变量

[英]Create train and test variables from loaded arff file

I want perform multilabel classification. 我想执行多标签分类。 A have a dataset in arff format which I load. 我有一个以arff格式加载的数据集。 However I don't now how convert import data to X and y vectors in order to apply sklearn/train_test_split. 但是,我现在不如何将导入数据转换为X和y向量以应用sklearn / train_test_split。

How can I get X and y? 如何获得X和y?

data, meta = scipy.io.arff.loadarff('../yeast-train.arff')
df = pd.DataFrame(data)

#Get X, y
X, y = ??? <---

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

Ok. 好。 Its a multilabel data in which features are in the columns Att1, Att2, Att3.... Att20 and targets are in the columns Class1, Class2, .... Class14 . 它是一个多Att1, Att2, Att3.... Att20数据,其特征位于Att1, Att2, Att3.... Att20列中Att1, Att2, Att3.... Att20和目标位于Class1, Class2, .... Class14列中。

So you need to use those columns for getting the X and y. 因此,您需要使用这些列来获取X和y。 Do it like this: 像这样做:

# Fill the .... with all other column names
feature_cols = ['Att1', 'Att2', 'Att3', 'Att4', 'Att5' ....   'Att20']
target_cols = ['Class1', 'Class2', 'Class3', 'Class4', ....   'Class14']

X, y = df[feature_cols], df[target_cols]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM