简体   繁体   English

如何从 tf.keras.preprocessing.image_dataset_from_directory() 探索和修改创建的数据集?

[英]How can I explore and modify the created dataset from tf.keras.preprocessing.image_dataset_from_directory()?

Here's how I used the function:以下是我如何使用该功能:

dataset = tf.keras.preprocessing.image_dataset_from_directory(
    main_directory,
    labels='inferred',
    image_size=(299, 299),
    validation_split=0.1,
    subset='training',
    seed=123
)

I'd like to explore the created dataset much like in this example , particularly the part where it was converted to a pandas dataframe.我想像在这个例子中一样探索创建的数据集,特别是它被转换为pandas数据帧的部分。 But my minimum goal is to check the labels and the number of files attached to it, just to check if, indeed, it created the dataset as expected (sub-directory being the corresponding label of images inside it).但我的最低目标是检查标签和附加到它的文件数量,只是为了检查它是否确实按预期创建了数据集(子目录是其中图像的相应标签)。

To be clear, the main_directory is set up like this:需要明确的是, main_directory是这样设置的:

main_directory
- class_a
  - 000.jpg
  - ...
- class_b
  - 100.jpg
  - ...

And I'd like to see the dataset display its info with something like this:我希望看到数据集以如下方式显示其信息:

label     number of images
class_a   100
class_b   100

Additionally, is it possible to remove labels and corresponding images in a dataset?此外,是否可以删除数据集中的标签和相应的图像? The idea is to drop them if the corresponding number of images is less than a certain number, or a different metric.这个想法是如果相应的图像数量小于某个数量或不同的度量标准,则删除它们。 It can be of course done outside this function through other means, but I'd like to know if it is indeed possible, and if so, how.它当然可以通过其他方式在这个函数之外完成,但我想知道它是否确实可能,如果是,如何。

EDIT: For additional context, the end goal of all of this is to train a pre-trained model like this with local images divided into folders named after their classes.编辑:对于额外的上下文,所有这些的最终目标是训练一个像这样的预训练模型,将本地图像划分为以类别命名的文件夹。 If there is a better way that includes not using that function and meets this end goal, it's welcome all the same.如果有更好的方法,包括不使用该功能并满足此最终目标,则同样欢迎。 Thanks!谢谢!

I think it would be much easier to use glob2 to get all your filenames, process them as you want to, then make a simple loading function that will replace image_dataset_from_directory .我认为使用glob2获取所有文件名,根据需要处理它们,然后创建一个简单的加载函数来替换image_dataset_from_directoryimage_dataset_from_directory

Get all your files:获取所有文件:

files = glob2.glob('class_*\\*.jpg')

Then manipulate this list of filenames as desired.然后根据需要操作此文件名列表。

Then, make a function to load the images:然后,创建一个函数来加载图像:

def load(file_path):
    img = tf.io.read_file(file_path)
    img = tf.image.decode_jpeg(img, channels=3)
    img = tf.image.convert_image_dtype(img, tf.float32)
    img = tf.image.resize(img, size=(299, 299))
    label = tf.strings.split(file_path, os.sep)[0]
    label = tf.cast(tf.equal(label, 'class_a'), tf.int32)
    return img, label

Then create your dataset for training:然后创建用于训练的数据集:

train_ds = tf.data.Dataset.from_tensor_slices(files).map(load).batch(4)

Then train:然后训练:

model.fit(train_ds)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 tf.keras.preprocessing.image_dataset_from_directory 获取类的数量? - how to obtain the number of classes using tf.keras.preprocessing.image_dataset_from_directory? tf.keras.preprocessing.image_dataset_from_directory 如何将 output 显示到控制台 - How tf.keras.preprocessing.image_dataset_from_directory display output to console 来自 tf.keras.preprocessing.image_dataset_from_directory 的 x_test 和 y_test - x_test and y_test from tf.keras.preprocessing.image_dataset_from_directory 使用 tf.keras.preprocessing.image_dataset_from_directory 的 tf.data.Dataset 训练模型是非常慢的 keras - train model using tf.data.Dataset of tf.keras.preprocessing.image_dataset_from_directory is very slow keras tf.keras.preprocessing.image_dataset_from_directory 值错误:找不到图像 - tf.keras.preprocessing.image_dataset_from_directory Value Error: No images found ValueError:使用 tf.keras.preprocessing.image_dataset_from_directory 时要解压的值太多(预期为 2) - ValueError: too many values to unpack (expected 2) when using tf.keras.preprocessing.image_dataset_from_directory 设置一次后更改 tf.keras.preprocessing.image_dataset_from_directory 的 label_mode - Changing label_mode of tf.keras.preprocessing.image_dataset_from_directory after setting it once 无法将 tf.keras.preprocessing.image_dataset_from_directory 转换为 np.array - Cannot convert tf.keras.preprocessing.image_dataset_from_directory to np.array 当我们在 tf.keras.preprocessing.image_dataset_from_directory 对象上使用 .next() 或 .take() 时,我们是否会丢失数据? - Are we loosing data when we use .next() or .take() on tf.keras.preprocessing.image_dataset_from_directory object? 来自 tf.keras.preprocessing.image.ImageDataGenerator.flow_from_directory 的 tf.data.Dataset? - tf.data.Dataset from tf.keras.preprocessing.image.ImageDataGenerator.flow_from_directory?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM