简体   繁体   English

如何在 TensorFlow 中使用我自己的数据将图像拆分为测试集和训练集

[英]How to split images into test and train set using my own data in TensorFlow

I am a little confused here... I just spent the last hour reading about how to split my dataset into test/train in TensorFlow.我在这里有点困惑......我刚刚花了最后一个小时阅读如何在 TensorFlow 中将我的数据集拆分为测试/训练。 I was following this tutorial to import my images: https://www.tensorflow.org/tutorials/load_data/images .我正在按照本教程导入我的图像: https : //www.tensorflow.org/tutorials/load_data/images Apparently one can split into train/test with sklearn: model_selection.train_test_split .显然,可以使用 sklearn 拆分为训练/测试: model_selection.train_test_split

But my question is: when do I split my dataset into train/test.但我的问题是:我什么时候将数据集拆分为训练/测试。 I already have done this with my dataset (see below), now what?我已经用我的数据集完成了这个(见下文),现在怎么办? How do I split it?我该如何拆分? Do I have to do it before loading the files as tf.data.Dataset ?在将文件加载为tf.data.Dataset之前,我必须这样做吗?

# determine names of classes
CLASS_NAMES = np.array([item.name for item in data_dir.glob('*') if item.name != "LICENSE.txt"])
print(CLASS_NAMES)

# count images
image_count = len(list(data_dir.glob('*/*.png')))
print(image_count)


# load the files as a tf.data.Dataset
list_ds = tf.data.Dataset.list_files(str(cwd + '/train/' + '*/*'))

Also, my data structure looks like the following.此外,我的数据结构如下所示。 No test folder, no val folder.没有 test 文件夹,没有 val 文件夹。 I would need to take 20% for test from that train set.我需要从那组火车中抽取 20% 进行测试。

train
 |__ class 1
 |__ class 2
 |__ class 3

You can use tf.keras.preprocessing.image.ImageDataGenerator :您可以使用tf.keras.preprocessing.image.ImageDataGenerator

image_generator = tf.keras.preprocessing.image.ImageDataGenerator(validation_split=0.2)
train_data_gen = image_generator.flow_from_directory(directory='train',
                                                     subset='training')
val_data_gen = image_generator.flow_from_directory(directory='train',
                                                   subset='validation')

Note that you'll probably need to set other data-related parameters for your generator.请注意,您可能需要为生成器设置其他与数据相关的参数

UPDATE: You can obtain two slices of your dataset via skip() and take() :更新:您可以通过skip()take()获取数据集的两个切片:

val_data = data.take(val_data_size)
train_data = data.skip(val_data_size)

If you have all data in same folder and wanted to split into validation/testing using tf.data then do the following:如果您在同一文件夹中拥有所有数据并希望使用tf.data拆分为验证/测试,请执行以下操作:

list_ds = tf.data.Dataset.list_files(str(cwd + '/train/' + '*/*'))
image_count = len(list(data_dir.glob('*/*.png')))

val_size = int(image_count * 0.2) 
train_set = list_ds.skip(val_size)
val_set = list_ds.take(val_size) 

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 tensorflow 将数据拆分为测试和训练 - how to split data into test and train using tensorflow 如何使用 Tensorflow 设置分类标签并将数据拆分为训练、测试和开发拆分? - How do i set the categorical labels and split the data into train,test and dev splits using Tensorflow? 如何使用train_test_split将未标记的数据拆分为训练集和测试集? - How to split unlabeled data into train and test set using train_test_split? 如何使用 Python Numpy 中的 train_test_split 将数据拆分为训练、测试和验证数据集? 分裂不应该是随机的 - How to split data by using train_test_split in Python Numpy into train, test and validation data set? The split should not random 对来自本地目录的图像使用train_test_split - Using train_test_split with images from my local directory 如何在不使用train_test_split()的情况下拆分数据集? - How to split the data set without train_test_split()? 如何准备自己的数据集并使用pytorch或tensorflow横切? - how to prepare my own data set and transom it using pytorch or tensorflow? 如何使用我自己的数据在“Floydhub”上运行“Pix2Pix”代码的train / test命令? - How to run train/test command for “Pix2Pix” code on “Floydhub” using my own data? 如何在不使用 function train_test_split 的情况下将数据拆分为测试和训练? - How can I split the data into test and train without using function train_test_split? 如何准备图像数据集以训练和测试张量流 - How to prepare a dataset of images to train and test tensorflow
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM