如何在 Tensorflow 中按张量形状过滤数据集

Question

I have loaded a dataset from tfds.load and want to throw away certain images that interfere with proper training/are of no use to me (for example, are too small).我已经从 tfds.load 加载了一个数据集，并且想要丢弃某些干扰正确训练/对我没有用的图像（例如，太小）。

It seems like there is absolutely no information on this specific problem anywhere so I went with what seems like the best fit which was.filter(predicate) on the dataset.似乎在任何地方都没有关于这个特定问题的信息，所以我选择了最适合数据集的 was.filter(predicate) 。 Unfortunately the input to the predicate has indeterminate shape (None, None, 3) and as expected raises an error that 'int' cannot be compared with 'NoneType'.不幸的是，谓词的输入具有不确定的形状（无，无，3），并且正如预期的那样会引发一个错误，即“int”无法与“NoneType”进行比较。

Is it even possible to solve this problem in tensorflow or should I not waste my time?甚至有可能在 tensorflow 中解决这个问题，还是我不应该浪费时间？

Pseudo code伪代码

ds_train = tfds.load('name')
ds_train = ds_train.map(lambda ds: ds['image'])
ds_train = ds_train.filter(lambda image: image.shape[0] >= 256)

Answer 1

When writing code with tf.data.Dataset , you should use tf.shape(tensor) rather than tensor.shape , because tf.data.Dataset works in graph mode.使用tf.data.Dataset编写代码时，应使用tf.shape(tensor)而不是tensor.shape ，因为tf.data.Dataset在图形模式下工作。

Quoting the documentation oftf.shape :引用tf.shape的文档：

tf.shape and Tensor.shape should be identical in eager mode. tf.shape 和 Tensor.shape 在 Eager 模式下应该相同。 Within tf.function or within a compat.v1 context, not all dimensions may be known until execution time.在 tf.function 或 compat.v1 上下文中，直到执行时才可能知道所有维度。 Hence when defining custom layers and models for graph mode, prefer the dynamic tf.shape(x) over the static x.shape.因此，在为图形模式定义自定义层和模型时，更喜欢动态 tf.shape(x) 而不是 static x.shape。

ds_train = ds_train.filter(lambda image: tf.shape(image)[0] >= 256)

如何在 Tensorflow 中按张量形状过滤数据集

问题描述

1 个解决方案

解决方案1
0 已采纳 2021-05-12 14:05:40

如何在 Tensorflow 中按张量形状过滤数据集

问题描述

1 个解决方案

解决方案1 0 已采纳 2021-05-12 14:05:40

解决方案1
0 已采纳 2021-05-12 14:05:40