简体   繁体   English

训练和验证数据集的拆分

[英]Splitting of training and validation dataset

I need to split my training data (80-20) into validation data in a way that the split sub-datasets are not random but always the same.我需要将我的训练数据 (80-20) 拆分为验证数据,使拆分的子数据集不是随机的,但始终相同。

Presently I use this code目前我使用这个代码

from sklearn.model_selection import train_test_split
X_train, X_val, Y_train, Y_val = train_test_split(X, Y, test_size=0.2)

but the split sub-datasets are always random and never the same.但是拆分的子数据集始终是随机的,并且永远不会相同。 I want it to be random but the same value should be present when I run the code again ( something like np.random.seed)我希望它是随机的,但是当我再次运行代码时应该存在相同的值(类似于 np.random.seed)

Is there a way to do that?有没有办法做到这一点?

train_test_split() has a random_state argument. train_test_split()有一个random_state参数。 If you assign to it an integer value the result will be always the same:如果您为其分配 integer 值,则结果将始终相同:

from sklearn.model_selection import train_test_split
X_train, X_val, Y_train, Y_val = train_test_split(X, Y, test_size=0.2, random_state=1)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 是否有python函数将数据集分为训练,验证和测试? - Is there an python function for splitting the dataset into training, validation and testing? 最好在拆分为训练和验证集之前或拆分之后将 MinMaxScaler 应用于您的数据集 - Is it best to apply MinMaxScaler to your dataset before splitting into training and Validation set or after splitting 将 tensorflow 数据集从 keras 拆分为训练集、测试集和验证集。预处理 API - Splitting a tensorflow dataset into training, test, and validation sets from keras.preprocessing API 训练和验证损失和数据集大小 - Training & Validation loss and dataset size 给定比率将数据集拆分为训练和测试数据集 - Splitting a dataset into training and test datasets given a ratio 分割数据集以逐行训练和测试 - Splitting dataset for training and testing row wise 为什么使用 random_split 训练模型从数据集中拆分验证集有效但 ImageFolder 加载不起作用 - Why splitting the validation set from the dataset using random_split training the model works but ImageFolder loading doesn't 是否可以将训练DataLoader(和数据集)拆分为训练和验证数据集? - Is it possible to split the training DataLoader (and dataset) into training and validation datasets? Tensorflow 将数据集拆分为训练和测试导致瓶颈/缓慢 - Tensorflow splitting dataset into training and testing causes bottleneck/slow 将 Pandas 数据帧分层拆分为训练、验证和测试集 - Stratified splitting of pandas dataframe into training, validation and test set
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM