[英]How can I split the dataset obtained from image_dataset_from_directory into data and labels?
I'm trying to build a CNN in TensorFlow with Python.我正在尝试使用 Python 在 TensorFlow 中构建 CNN。 I've loaded my images into a dataset as follows:
我已将图像加载到数据集中,如下所示:
dataset = tf.keras.preprocessing.image_dataset_from_directory(
"train_data", shuffle=True, image_size=(578, 260),
batch_size=BATCH_SIZE)
However, if I want to use train_test_split or fit_resample on this dataset, I need to separate it into data and labels.但是,如果我想在这个数据集上使用 train_test_split 或 fit_resample,我需要将它分成数据和标签。 I'm new to TensorFlow and don't know how to do this.
我是 TensorFlow 的新手,不知道该怎么做。 Would really appreciate any help.
非常感谢任何帮助。
You can use the subset
parameter to separate your data into training
and validation
.您可以使用
subset
参数将数据分为training
和validation
。
import tensorflow as tf
import pathlib
dataset_url = "https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz"
data_dir = tf.keras.utils.get_file('flower_photos', origin=dataset_url, untar=True)
data_dir = pathlib.Path(data_dir)
train_ds = tf.keras.utils.image_dataset_from_directory(
data_dir,
validation_split=0.2,
subset="training",
image_size=(256, 256),
seed=1,
batch_size=32)
val_ds = tf.keras.utils.image_dataset_from_directory(
data_dir,
validation_split=0.2,
subset="validation",
seed=1,
image_size=(256, 256),
batch_size=32)
for x, y in train_ds.take(1):
print('Image --> ', x.shape, 'Label --> ', y.shape)
Found 3670 files belonging to 5 classes.
Using 2936 files for training.
Found 3670 files belonging to 5 classes.
Using 734 files for validation.
Image --> (32, 256, 256, 3) Label --> (32,)
As for your labels, according to the docs :至于你的标签,根据文档:
Either "inferred" (labels are generated from the directory structure), None (no labels), or a list/tuple of integer labels of the same size as the number of image files found in the directory.
“推断”(标签是从目录结构生成的)、无(无标签)或与目录中找到的图像文件数量相同大小的整数标签列表/元组。 Labels should be sorted according to the alphanumeric order of the image file paths (obtained via os.walk(directory) in Python).
标签应根据图像文件路径的字母数字顺序排序(通过 Python 中的 os.walk(directory) 获得)。
So just try iterating over the train_ds
and see if they are there.因此,只需尝试遍历
train_ds
并查看它们是否存在。 You can also use the parameters label_mode
to refer to the kind of labels you have and class_names
to explicitly list your classes.您还可以使用参数
label_mode
来引用您拥有的标签类型,并使用class_names
来明确列出您的类。
If your classes are inbalanced, you can use the class_weights
parameter of model.fit(*)
.如果你的类不平衡,你可以使用
model.fit(*)
的class_weights
参数。 For more information, check out this post .有关更多信息,请查看此帖子。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.