如何将从 image_dataset_from_directory 获得的数据集拆分为数据和标签？

Question

I'm trying to build a CNN in TensorFlow with Python.我正在尝试使用 Python 在 TensorFlow 中构建 CNN。 I've loaded my images into a dataset as follows:我已将图像加载到数据集中，如下所示：

dataset = tf.keras.preprocessing.image_dataset_from_directory(
    "train_data", shuffle=True, image_size=(578, 260),
    batch_size=BATCH_SIZE)

However, if I want to use train_test_split or fit_resample on this dataset, I need to separate it into data and labels.但是，如果我想在这个数据集上使用 train_test_split 或 fit_resample，我需要将它分成数据和标签。 I'm new to TensorFlow and don't know how to do this.我是 TensorFlow 的新手，不知道该怎么做。 Would really appreciate any help.非常感谢任何帮助。

Answer 1

You can use the subset parameter to separate your data into training and validation .您可以使用subset参数将数据分为training和validation 。

import tensorflow as tf
import pathlib

dataset_url = "https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz"
data_dir = tf.keras.utils.get_file('flower_photos', origin=dataset_url, untar=True)
data_dir = pathlib.Path(data_dir)


train_ds = tf.keras.utils.image_dataset_from_directory(
  data_dir,
  validation_split=0.2,
  subset="training",
  image_size=(256, 256),
  seed=1,
  batch_size=32)

val_ds = tf.keras.utils.image_dataset_from_directory(
  data_dir,
  validation_split=0.2,
  subset="validation",
  seed=1,
  image_size=(256, 256),
  batch_size=32)

for x, y in train_ds.take(1):
  print('Image --> ', x.shape, 'Label --> ',  y.shape)

Found 3670 files belonging to 5 classes.
Using 2936 files for training.
Found 3670 files belonging to 5 classes.
Using 734 files for validation.
Image -->  (32, 256, 256, 3) Label -->  (32,)

As for your labels, according to the docs :至于你的标签，根据文档：

Either "inferred" (labels are generated from the directory structure), None (no labels), or a list/tuple of integer labels of the same size as the number of image files found in the directory. “推断”（标签是从目录结构生成的）、无（无标签）或与目录中找到的图像文件数量相同大小的整数标签列表/元组。 Labels should be sorted according to the alphanumeric order of the image file paths (obtained via os.walk(directory) in Python).标签应根据图像文件路径的字母数字顺序排序（通过 Python 中的 os.walk(directory) 获得）。

So just try iterating over the train_ds and see if they are there.因此，只需尝试遍历train_ds并查看它们是否存在。 You can also use the parameters label_mode to refer to the kind of labels you have and class_names to explicitly list your classes.您还可以使用参数label_mode来引用您拥有的标签类型，并使用class_names来明确列出您的类。

If your classes are inbalanced, you can use the class_weights parameter of model.fit(*) .如果你的类不平衡，你可以使用model.fit(*)的class_weights参数。 For more information, check out this post .有关更多信息，请查看此帖子。

如何将从 image_dataset_from_directory 获得的数据集拆分为数据和标签？

问题描述

1 个解决方案

解决方案1
3 已采纳 2021-11-05 06:18:38

如何将从 image_dataset_from_directory 获得的数据集拆分为数据和标签？

问题描述

1 个解决方案

解决方案1 3 已采纳 2021-11-05 06:18:38

解决方案1
3 已采纳 2021-11-05 06:18:38