[英]How can I explore and modify the created dataset from tf.keras.preprocessing.image_dataset_from_directory()?
Here's how I used the function:以下是我如何使用该功能:
dataset = tf.keras.preprocessing.image_dataset_from_directory(
main_directory,
labels='inferred',
image_size=(299, 299),
validation_split=0.1,
subset='training',
seed=123
)
I'd like to explore the created dataset much like in this example , particularly the part where it was converted to a pandas
dataframe.我想像在这个例子中一样探索创建的数据集,特别是它被转换为
pandas
数据帧的部分。 But my minimum goal is to check the labels and the number of files attached to it, just to check if, indeed, it created the dataset as expected (sub-directory being the corresponding label of images inside it).但我的最低目标是检查标签和附加到它的文件数量,只是为了检查它是否确实按预期创建了数据集(子目录是其中图像的相应标签)。
To be clear, the main_directory
is set up like this:需要明确的是,
main_directory
是这样设置的:
main_directory
- class_a
- 000.jpg
- ...
- class_b
- 100.jpg
- ...
And I'd like to see the dataset display its info with something like this:我希望看到数据集以如下方式显示其信息:
label number of images
class_a 100
class_b 100
Additionally, is it possible to remove labels and corresponding images in a dataset?此外,是否可以删除数据集中的标签和相应的图像? The idea is to drop them if the corresponding number of images is less than a certain number, or a different metric.
这个想法是如果相应的图像数量小于某个数量或不同的度量标准,则删除它们。 It can be of course done outside this function through other means, but I'd like to know if it is indeed possible, and if so, how.
它当然可以通过其他方式在这个函数之外完成,但我想知道它是否确实可能,如果是,如何。
EDIT: For additional context, the end goal of all of this is to train a pre-trained model like this with local images divided into folders named after their classes.编辑:对于额外的上下文,所有这些的最终目标是训练一个像这样的预训练模型,将本地图像划分为以类别命名的文件夹。 If there is a better way that includes not using that function and meets this end goal, it's welcome all the same.
如果有更好的方法,包括不使用该功能并满足此最终目标,则同样欢迎。 Thanks!
谢谢!
I think it would be much easier to use glob2
to get all your filenames, process them as you want to, then make a simple loading function that will replace image_dataset_from_directory
.我认为使用
glob2
获取所有文件名,根据需要处理它们,然后创建一个简单的加载函数来替换image_dataset_from_directory
会image_dataset_from_directory
。
Get all your files:获取所有文件:
files = glob2.glob('class_*\\*.jpg')
Then manipulate this list of filenames as desired.然后根据需要操作此文件名列表。
Then, make a function to load the images:然后,创建一个函数来加载图像:
def load(file_path):
img = tf.io.read_file(file_path)
img = tf.image.decode_jpeg(img, channels=3)
img = tf.image.convert_image_dtype(img, tf.float32)
img = tf.image.resize(img, size=(299, 299))
label = tf.strings.split(file_path, os.sep)[0]
label = tf.cast(tf.equal(label, 'class_a'), tf.int32)
return img, label
Then create your dataset for training:然后创建用于训练的数据集:
train_ds = tf.data.Dataset.from_tensor_slices(files).map(load).batch(4)
Then train:然后训练:
model.fit(train_ds)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.