简体   繁体   English

当我们在 tf.keras.preprocessing.image_dataset_from_directory 对象上使用 .next() 或 .take() 时,我们是否会丢失数据?

[英]Are we loosing data when we use .next() or .take() on tf.keras.preprocessing.image_dataset_from_directory object?

I create a data generator like this:我创建了一个这样的数据生成器:

# Create test_dataset
test_dataset = \
  tf.keras.preprocessing.image_dataset_from_directory(directory=test_dir,
                                                      labels='inferred', 
                                                      label_mode='int', 
                                                      class_names=None,
                                                      seed=42, 
                                                      )
# Explore the first batch
for images, labels in test_dataset.take(1):
  print(labels)

it returns:它返回:

tf.Tensor([5 3 8 3 8 5 7 6 3 8 4 2 4 5 5 4 0 1 0 5 5 2 6 0 7 9 9 0 4 9 6 4], shape=(32,), dtype=int32)

if I re-run the last part as below:如果我重新运行最后一部分如下:

for images, labels in test_dataset.take(1):
  print(labels)

it returns something different from the first time:它返回与第一次不同的东西:

tf.Tensor([0 6 2 5 5 7 5 2 7 4 0 5 0 4 6 5 8 7 7 3 5 1 1 9 5 2 6 6 6 6 2 0], shape=(32,), dtype=int32)

if I recreate test_dataset and explore it as below:如果我重新创建test_dataset并按如下方式进行探索:

# Create test_dataset
test_dataset = \
  tf.keras.preprocessing.image_dataset_from_directory(directory=test_dir,
                                                      labels='inferred', 
                                                      label_mode='int', 
                                                      class_names=None,
                                                      seed=42, 
                                                      )
# Explore the first batch
for images, labels in test_dataset.take(1):
  print(labels)

it returns the same as the first time:它返回与第一次相同的结果:

tf.Tensor([5 3 8 3 8 5 7 6 3 8 4 2 4 5 5 4 0 1 0 5 5 2 6 0 7 9 9 0 4 9 6 4], shape=(32,), dtype=int32)

Well, I conclude that when I use the take method, the batch is popped out and lost and no more accessible to be used in the modeling and validation, etc.好吧,我得出的结论是,当我使用take方法时,批处理会弹出并丢失,并且无法在建模和验证等中使用。

My question is:我的问题是:

  • Am I right?我对吗? Is the first batch lost if I run test_dataset.take(1)如果我运行test_dataset.take(1) ,第一批是否丢失
  • If the answer to the above question is yes, is there any way not to loose a bacth when trying to explore batches in tf.keras.preprocessing.image_dataset_from_directory object?如果上述问题的答案是肯定的,那么在尝试探索tf.keras.preprocessing.image_dataset_from_directory对象中的批次时,有什么方法可以不松懈吗?

That's not about losing the batch.这不是关于丢失批次。 Function tf.keras.preprocessing.image_dataset_from_directory has an argument shuffle that is default value is True .函数tf.keras.preprocessing.image_dataset_from_directory有一个参数shuffle ,默认值为True That said, dataset is shuffled at each iteration.也就是说,数据集在每次迭代时都会被打乱。

If we dive into the source code :如果我们深入研究源代码

  if shuffle:
    # Shuffle locally at each iteration
    dataset = dataset.shuffle(buffer_size=batch_size * 8, seed=seed)
  dataset = dataset.batch(batch_size)

Under the hood as you can see it creates a tf.data object which has shuffle method.正如你所看到的,它创建了一个具有shuffle方法的tf.data对象。 Shuffle Method has an argument reshuffle_each_iteration = True by default. Shuffle Method默认有一个参数reshuffle_each_iteration = True With 2nd take method you are iterating over the dataset again that causes it to get shuffled again.使用 2nd take 方法,您将再次迭代数据集,导致它再次被打乱。

If you set shuffle = False for the dataset, then the data will be sorted in a alphanumeric order and its order won't change at each iteration.如果为数据集设置shuffle = False ,则数据将按字母数字顺序排序,并且每次迭代时其顺序都不会改变。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 tf.keras.preprocessing.image_dataset_from_directory() 创建具有特定形状的数据集? - How do I use tf.keras.preprocessing.image_dataset_from_directory() to create a dataset with a certain shape? ValueError:使用 tf.keras.preprocessing.image_dataset_from_directory 时要解压的值太多(预期为 2) - ValueError: too many values to unpack (expected 2) when using tf.keras.preprocessing.image_dataset_from_directory 使用 tf.keras.preprocessing.image_dataset_from_directory 的 tf.data.Dataset 训练模型是非常慢的 keras - train model using tf.data.Dataset of tf.keras.preprocessing.image_dataset_from_directory is very slow keras 如何从 tf.keras.preprocessing.image_dataset_from_directory() 探索和修改创建的数据集? - How can I explore and modify the created dataset from tf.keras.preprocessing.image_dataset_from_directory()? 来自 tf.keras.preprocessing.image_dataset_from_directory 的 x_test 和 y_test - x_test and y_test from tf.keras.preprocessing.image_dataset_from_directory tf.keras.preprocessing.image_dataset_from_directory 值错误:找不到图像 - tf.keras.preprocessing.image_dataset_from_directory Value Error: No images found 设置一次后更改 tf.keras.preprocessing.image_dataset_from_directory 的 label_mode - Changing label_mode of tf.keras.preprocessing.image_dataset_from_directory after setting it once 如何使用 tf.keras.preprocessing.image_dataset_from_directory 获取类的数量? - how to obtain the number of classes using tf.keras.preprocessing.image_dataset_from_directory? 无法将 tf.keras.preprocessing.image_dataset_from_directory 转换为 np.array - Cannot convert tf.keras.preprocessing.image_dataset_from_directory to np.array tf.keras.preprocessing.image_dataset_from_directory 如何将 output 显示到控制台 - How tf.keras.preprocessing.image_dataset_from_directory display output to console
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM