简体   繁体   中英

How can I split this dataset into train, validation, and test set?

I have defined a dataset with my own data, following the instructions from https://www.tensorflow.org/tutorials/load_data/images , as below:

list_ds = tf.data.Dataset.list_files(str(data_dir/'*/*'))

I have looked through the methods of tf.data.Dataset , but couldn't figure out how to split this dataset into three parts(train, validation, test) like tfds.Split .

How can I split this dataset into three parts? I hope the size of train/validation/test set to be 80%, 10%, 10% of list_ds each.

This can be achieved in multiple ways:

1) Put your train, test and validation data into three separate folders and call tf.data.Dataset.list_files(...) 3 times with appropriate file path.

2) Make use of Dataset.skip() and Dataset.take() . You will have to manually count the actual number of entries to skip/take based on your dataset size.

More information about dataset maneuvers can be found in TF Docs: https://www.tensorflow.org/guide/data

Hope this helped!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM