如何将此数据集拆分为训练集、验证集和测试集？

Question

I have defined a dataset with my own data, following the instructions from https://www.tensorflow.org/tutorials/load_data/images , as below:我已经按照https://www.tensorflow.org/tutorials/load_data/images的说明用我自己的数据定义了一个数据集，如下所示：

list_ds = tf.data.Dataset.list_files(str(data_dir/'*/*'))

I have looked through the methods of tf.data.Dataset , but couldn't figure out how to split this dataset into three parts(train, validation, test) like tfds.Split .我查看了tf.data.Dataset的方法，但无法弄清楚如何将此数据集拆分为三部分（训练、验证、测试），如tfds.Split 。

How can I split this dataset into three parts?我怎样才能把这个数据集分成三部分？ I hope the size of train/validation/test set to be 80%, 10%, 10% of list_ds each.我希望训练/验证/测试集的大小分别为 list_ds 的 80%、10%、10%。

Answer 1

This can be achieved in multiple ways:这可以通过多种方式实现：

1) Put your train, test and validation data into three separate folders and call tf.data.Dataset.list_files(...) 3 times with appropriate file path. 1) 将您的训练、测试和验证数据放入三个单独的文件夹中，并使用适当的文件路径调用tf.data.Dataset.list_files(...) 3 次。

2) Make use of Dataset.skip() and Dataset.take() . 2）利用Dataset.skip()和Dataset.take() 。 You will have to manually count the actual number of entries to skip/take based on your dataset size.您必须根据数据集大小手动计算要跳过/获取的实际条目数。

More information about dataset maneuvers can be found in TF Docs: https://www.tensorflow.org/guide/data有关数据集操作的更多信息可以在 TF Docs 中找到： https : //www.tensorflow.org/guide/data

Hope this helped!希望这有帮助！

如何将此数据集拆分为训练集、验证集和测试集？

问题描述

1 个解决方案

解决方案1
0 2019-12-04 15:34:25

如何将此数据集拆分为训练集、验证集和测试集？

问题描述

1 个解决方案

解决方案1 0 2019-12-04 15:34:25

解决方案1
0 2019-12-04 15:34:25