简体   繁体   English

如何将此数据集拆分为训练集、验证集和测试集?

[英]How can I split this dataset into train, validation, and test set?

I have defined a dataset with my own data, following the instructions from https://www.tensorflow.org/tutorials/load_data/images , as below:我已经按照https://www.tensorflow.org/tutorials/load_data/images的说明用我自己的数据定义了一个数据集,如下所示:

list_ds = tf.data.Dataset.list_files(str(data_dir/'*/*'))

I have looked through the methods of tf.data.Dataset , but couldn't figure out how to split this dataset into three parts(train, validation, test) like tfds.Split .我查看了tf.data.Dataset的方法,但无法弄清楚如何将此数据集拆分为三部分(训练、验证、测试),如tfds.Split

How can I split this dataset into three parts?我怎样才能把这个数据集分成三部分? I hope the size of train/validation/test set to be 80%, 10%, 10% of list_ds each.我希望训练/验证/测试集的大小分别为 list_ds 的 80%、10%、10%。

This can be achieved in multiple ways:这可以通过多种方式实现:

1) Put your train, test and validation data into three separate folders and call tf.data.Dataset.list_files(...) 3 times with appropriate file path. 1) 将您的训练、测试和验证数据放入三个单独的文件夹中,并使用适当的文件路径调用tf.data.Dataset.list_files(...) 3 次。

2) Make use of Dataset.skip() and Dataset.take() . 2)利用Dataset.skip()Dataset.take() You will have to manually count the actual number of entries to skip/take based on your dataset size.您必须根据数据集大小手动计算要跳过/获取的实际条目数。

More information about dataset maneuvers can be found in TF Docs: https://www.tensorflow.org/guide/data有关数据集操作的更多信息可以在 TF Docs 中找到: https : //www.tensorflow.org/guide/data

Hope this helped!希望这有帮助!

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何正确拆分不平衡数据集以训练和测试集? - How can I properly split imbalanced dataset to train and test set? 如何将此数据集拆分为训练集和验证集? - how to split this dataset into train and validation set? 如何正确拆分包含训练测试和交叉验证集的不平衡数据集 - How to correctly split unbalanced dataset incorporating train test and cross validation set 如何在 Python 脚本中将 tensorflow 数据集拆分为训练、测试和验证? - How to split a tensorflow dataset into train, test and validation in a Python script? 如何在不使用和拆分测试集的情况下将我的数据集拆分为训练和验证? - How can i split my dataset into training and validation with no using and spliting test set? 我如何使用 2 numpy arrays 作为去噪自动编码器的数据集,并将它们进一步拆分为训练集和测试集 - How can i use 2 numpy arrays as dataset for denoising autoencoder, and further split them into train and test sets 如何使用 Python Numpy 中的 train_test_split 将数据拆分为训练、测试和验证数据集? 分裂不应该是随机的 - How to split data by using train_test_split in Python Numpy into train, test and validation data set? The split should not random 如何使用 sklearn 中的 train_test_split 确保用户和项目同时出现在训练和测试数据集中? - How can I ensure that the users and items appear in both train and test data set with train_test_split in sklearn? 带有交叉验证的训练集拆分和测试集拆分的分数 - Scores for train set split and ​test set split with cross validation 如何在 tf 2.1.0 中创建 tf.data.Dataset 的训练、测试和验证拆分 - how to create train, test & validation split of tf.data.Dataset in tf 2.1.0
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM