训练和验证数据集的拆分

Question

I need to split my training data (80-20) into validation data in a way that the split sub-datasets are not random but always the same.我需要将我的训练数据 (80-20) 拆分为验证数据，使拆分的子数据集不是随机的，但始终相同。

Presently I use this code目前我使用这个代码

from sklearn.model_selection import train_test_split
X_train, X_val, Y_train, Y_val = train_test_split(X, Y, test_size=0.2)

but the split sub-datasets are always random and never the same.但是拆分的子数据集始终是随机的，并且永远不会相同。 I want it to be random but the same value should be present when I run the code again ( something like np.random.seed)我希望它是随机的，但是当我再次运行代码时应该存在相同的值（类似于 np.random.seed）

Is there a way to do that?有没有办法做到这一点？

Answer 1

train_test_split() has a random_state argument. train_test_split()有一个random_state参数。 If you assign to it an integer value the result will be always the same:如果您为其分配 integer 值，则结果将始终相同：

from sklearn.model_selection import train_test_split
X_train, X_val, Y_train, Y_val = train_test_split(X, Y, test_size=0.2, random_state=1)

训练和验证数据集的拆分

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-02-17 06:02:26

训练和验证数据集的拆分

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-02-17 06:02:26

解决方案1
1 已采纳 2021-02-17 06:02:26