Keras Model 中无限数据集的 steps_per_epoch 和 validation_steps

Question

I have a huge dataset of csv files having a volume of around 200GB.我有一个包含 csv 个文件的庞大数据集，文件大小约为 200GB。 I don't know the total number of records in the dataset.我不知道数据集中的记录总数。 I'm using make_csv_dataset to create a PreFetchDataset generator.我正在使用make_csv_dataset创建一个 PreFetchDataset 生成器。

I'm facing problem when Tensorflow complains to specify steps_per_epoch and validation_steps for infinite dataset....当 Tensorflow 抱怨为无限数据集指定 steps_per_epoch 和 validation_steps 时，我遇到了问题....

How can I specify the steps_per_epoch and validation_steps?如何指定 steps_per_epoch 和 validation_steps？
Can I pass these parameters as the percentage of total dataset size?我可以将这些参数作为总数据集大小的百分比传递吗？
Can I somehow avoid these parameters as I want my whole dataset to be iterated for each epoch?我可以以某种方式避免这些参数，因为我希望我的整个数据集在每个时期都被迭代吗？

I think this SO thread answer the case when we know to total number of data records in advance.我认为当我们提前知道数据记录总数时，这个 SO线程会回答这种情况。

Here is a screenshot from documentation.这是文档的屏幕截图。 But I'm not getting it properly.但我没有得到正确的。

What does the last line mean?最后一行是什么意思？

Answer 1

I see no other option than iterating through your entire dataset.除了遍历整个数据集，我看不到其他选择。

ds = tf.data.experimental.make_csv_dataset('myfile.csv', batch_size=16, num_epochs=1)

for ix, _ in enumerate(ds, 1):
    pass

print('The total number of steps is', ix)

Don't forget the num_epochs argument.不要忘记num_epochs参数。

Keras Model 中无限数据集的 steps_per_epoch 和 validation_steps

问题描述

1 个解决方案

解决方案1
2 已采纳 2021-01-29 13:38:33

Keras Model 中无限数据集的 steps_per_epoch 和 validation_steps

问题描述

1 个解决方案

解决方案1 2 已采纳 2021-01-29 13:38:33

解决方案1
2 已采纳 2021-01-29 13:38:33