简体   繁体   English

如何使用 Python Numpy 中的 train_test_split 将数据拆分为训练、测试和验证数据集? 分裂不应该是随机的

[英]How to split data by using train_test_split in Python Numpy into train, test and validation data set? The split should not random

I want to split data category wise into train, test and validation set.我想将数据类别明智地拆分为训练、测试和验证集。 For example: if we have 3 categories positive, negative and neutral in the dataset.例如:如果我们在数据集中有 3 个类别正面、负面和中性。 The positive category split into train, test, and validation.正面类别分为训练、测试和验证。 And the same with the other two categories.与其他两个类别相同。 The splitting ratio is 80% of the data is for training and 20% for testing.拆分率是 80% 的数据用于训练,20% 用于测试。 From 80% of the training data, split 10% for the validation data.从 80% 的训练数据中,拆分 10% 用于验证数据。 But the most important the split data should not random.但最重要的拆分数据不应该是随机的。

You can use the stratify parameter to do this:您可以使用stratify参数来执行此操作:

For example: If you were to use Iris dataset to do this.例如:如果您要使用 Iris 数据集来执行此操作。

from sklearn import cross_validation, datasets 

X = iris.data[:,:2]
y = iris.target

cross_validation.train_test_split(X,y,stratify=y)

You can read more here: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html您可以在此处阅读更多信息: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM