简体   繁体   English

如何将从 image_dataset_from_directory 获得的数据集拆分为数据和标签?

[英]How can I split the dataset obtained from image_dataset_from_directory into data and labels?

I'm trying to build a CNN in TensorFlow with Python.我正在尝试使用 Python 在 TensorFlow 中构建 CNN。 I've loaded my images into a dataset as follows:我已将图像加载到数据集中,如下所示:

dataset = tf.keras.preprocessing.image_dataset_from_directory(
    "train_data", shuffle=True, image_size=(578, 260),
    batch_size=BATCH_SIZE)

However, if I want to use train_test_split or fit_resample on this dataset, I need to separate it into data and labels.但是,如果我想在这个数据集上使用 train_test_split 或 fit_resample,我需要将它分成数据和标签。 I'm new to TensorFlow and don't know how to do this.我是 TensorFlow 的新手,不知道该怎么做。 Would really appreciate any help.非常感谢任何帮助。

You can use the subset parameter to separate your data into training and validation .您可以使用subset参数将数据分为trainingvalidation

import tensorflow as tf
import pathlib

dataset_url = "https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz"
data_dir = tf.keras.utils.get_file('flower_photos', origin=dataset_url, untar=True)
data_dir = pathlib.Path(data_dir)


train_ds = tf.keras.utils.image_dataset_from_directory(
  data_dir,
  validation_split=0.2,
  subset="training",
  image_size=(256, 256),
  seed=1,
  batch_size=32)

val_ds = tf.keras.utils.image_dataset_from_directory(
  data_dir,
  validation_split=0.2,
  subset="validation",
  seed=1,
  image_size=(256, 256),
  batch_size=32)

for x, y in train_ds.take(1):
  print('Image --> ', x.shape, 'Label --> ',  y.shape)
Found 3670 files belonging to 5 classes.
Using 2936 files for training.
Found 3670 files belonging to 5 classes.
Using 734 files for validation.
Image -->  (32, 256, 256, 3) Label -->  (32,)

As for your labels, according to the docs :至于你的标签,根据文档

Either "inferred" (labels are generated from the directory structure), None (no labels), or a list/tuple of integer labels of the same size as the number of image files found in the directory. “推断”(标签是从目录结构生成的)、无(无标签)或与目录中找到的图像文件数量相同大小的整数标签列表/元组。 Labels should be sorted according to the alphanumeric order of the image file paths (obtained via os.walk(directory) in Python).标签应根据图像文件路径的字母数字顺序排序(通过 Python 中的 os.walk(directory) 获得)。

So just try iterating over the train_ds and see if they are there.因此,只需尝试遍历train_ds并查看它们是否存在。 You can also use the parameters label_mode to refer to the kind of labels you have and class_names to explicitly list your classes.您还可以使用参数label_mode来引用您拥有的标签类型,并使用class_names来明确列出您的类。

If your classes are inbalanced, you can use the class_weights parameter of model.fit(*) .如果你的类不平衡,你可以使用model.fit(*)class_weights参数。 For more information, check out this post .有关更多信息,请查看此帖子

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Keras:`image_dataset_from_directory` 中标签的一次性使用 - Keras: one-hot for labels in `image_dataset_from_directory` Tensorflow image_dataset_from_directory 用于输入数据集和 output 数据集 - Tensorflow image_dataset_from_directory for input dataset and output dataset 使用 tensorflow image_dataset_from_directory 时从数据集中获取标签 - Get labels from dataset when using tensorflow image_dataset_from_directory image_dataset_from_directory 用于多标签分类 - image_dataset_from_directory for multilabel classifcation 如果通过 image_dataset_from_directory 获取,则验证集仅包含来自一类的图像 - Validation Set has images from only one class if obtained through image_dataset_from_directory 如何将 keras image_dataset_from_directory 与自定义结构一起使用? - How to use keras image_dataset_from_directory with custom structures? Keras 方法 image_dataset_from_directory() 如何区分 X 和 Y 数据? - How does the Keras method image_dataset_from_directory() distinguish X and Y data? 如何在专辑 label 中调整数据集 label 的大小以使用 tensorflow image_dataset_from_directory ZC1C425268E687A945D - How resize dataset label in albumentations label to work with tensorflow image_dataset_from_directory function? 如何查看keras的image_dataset_from_directory function生成的数据集? - How to view the dataset generated by the image_dataset_from_directory function of keras? 是否可以从 image_dataset_from_directory 获取图像名称? - is it possible to get image name from image_dataset_from_directory?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM