[英]How can I split this dataset into train, validation, and test set?
I have defined a dataset with my own data, following the instructions from https://www.tensorflow.org/tutorials/load_data/images , as below:我已经按照https://www.tensorflow.org/tutorials/load_data/images的说明用我自己的数据定义了一个数据集,如下所示:
list_ds = tf.data.Dataset.list_files(str(data_dir/'*/*'))
I have looked through the methods of tf.data.Dataset
, but couldn't figure out how to split this dataset into three parts(train, validation, test) like tfds.Split
.我查看了
tf.data.Dataset
的方法,但无法弄清楚如何将此数据集拆分为三部分(训练、验证、测试),如tfds.Split
。
How can I split this dataset into three parts?我怎样才能把这个数据集分成三部分? I hope the size of train/validation/test set to be 80%, 10%, 10% of list_ds each.
我希望训练/验证/测试集的大小分别为 list_ds 的 80%、10%、10%。
This can be achieved in multiple ways:这可以通过多种方式实现:
1) Put your train, test and validation data into three separate folders and call tf.data.Dataset.list_files(...)
3 times with appropriate file path. 1) 将您的训练、测试和验证数据放入三个单独的文件夹中,并使用适当的文件路径调用
tf.data.Dataset.list_files(...)
3 次。
2) Make use of Dataset.skip()
and Dataset.take()
. 2)利用
Dataset.skip()
和Dataset.take()
。 You will have to manually count the actual number of entries to skip/take based on your dataset size.您必须根据数据集大小手动计算要跳过/获取的实际条目数。
More information about dataset maneuvers can be found in TF Docs: https://www.tensorflow.org/guide/data有关数据集操作的更多信息可以在 TF Docs 中找到: https : //www.tensorflow.org/guide/data
Hope this helped!希望这有帮助!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.