[英]Splitting of training and validation dataset
I need to split my training data (80-20) into validation data in a way that the split sub-datasets are not random but always the same.我需要将我的训练数据 (80-20) 拆分为验证数据,使拆分的子数据集不是随机的,但始终相同。
Presently I use this code目前我使用这个代码
from sklearn.model_selection import train_test_split
X_train, X_val, Y_train, Y_val = train_test_split(X, Y, test_size=0.2)
but the split sub-datasets are always random and never the same.但是拆分的子数据集始终是随机的,并且永远不会相同。 I want it to be random but the same value should be present when I run the code again ( something like np.random.seed)我希望它是随机的,但是当我再次运行代码时应该存在相同的值(类似于 np.random.seed)
Is there a way to do that?有没有办法做到这一点?
train_test_split()
has a random_state
argument. train_test_split()
有一个random_state
参数。 If you assign to it an integer value the result will be always the same:如果您为其分配 integer 值,则结果将始终相同:
from sklearn.model_selection import train_test_split
X_train, X_val, Y_train, Y_val = train_test_split(X, Y, test_size=0.2, random_state=1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.