简体   繁体   English

tensorflow 2.0, model.fit() :您的输入数据不足

[英]tensorflow 2.0, model.fit() : Your input ran out of data

I am absolutely new to TensorFlow and Keras, and I am trying to make my way around trying out some code that I am finding online.我对 TensorFlow 和 Keras 完全陌生,我正在尝试尝试一些我在网上找到的代码。

In particular I am using the fashion-MNIST - consisting of 60000 examples and test set of 10000 examples.特别是我正在使用时尚 MNIST - 由 60000 个示例和 10000 个示例的测试集组成。 Each of them is a 28x28 grayscale image.它们中的每一个都是一个 28x28 的灰度图像。

I am following this tutorial " https://towardsdatascience.com/building-your-first-neural-network-in-tensorflow-2-tensorflow-for-hackers-part-i-e1e2f1dfe7a0 ", and I have no problem until the definition of我正在关注本教程“ https://towardsdatascience.com/building-your-first-neural-network-in-tensorflow-2-tensorflow-for-hackers-part-i-e1e2f1dfe7a0 ”,直到的定义

history = model.fit(
train_dataset.repeat(), 
epochs=10, 
steps_per_epoch=500,
validation_data=val_dataset.repeat(), 
validation_steps=2)

As long as I understood, I need to use train_dataset.repeat() as input dataset because otherwise I won't have enough training example using those values for the hyperparameters (epochs, steps_per_epochs).只要我理解,我就需要使用train_dataset.repeat()作为输入数据集,否则我将没有足够的训练示例使用这些值作为超参数(epochs、steps_per_epochs)。

My question is: how can I avoid to have to use .repeat() ?我的问题是:如何避免不得不使用.repeat() How do I need to change the hyperparameters?我需要如何更改超参数?

I am coping the code here, for simplicity:为简单起见,我在这里处理代码:

def preprocess(x,y):

    x = tf.cast(x,tf.float32) / 255.0
    y = tf.cast(y, tf.float32)

    return x,y 

def create_dataset(xs, ys, n_classes=10):

    ys = tf.one_hot(ys, depth=n_classes)

    return tf.data.Dataset.from_tensor_slices((xs, ys)).map(preprocess).shuffle(len(ys)).batch(128)


model.compile(optimizer = 'adam', loss =tf.losses.CategoricalCrossentropy(from_logits= True), metrics =['accuracy'])

history1 = model.fit(train_dataset.repeat(), 
                    epochs=10, 
                    steps_per_epoch=500,
                    validation_data=val_dataset.repeat(), 
                    validation_steps=2)

Thanks!谢谢!

If you don't want to use .repeat() you need to have your model passing thought your entire data only one time per epoch.如果你不想使用 .repeat() 你需要让你的模型通过你的整个数据每个时期只有一次。

In order to do that you need to calculate how many steps it will take for your model to pass throught the entire dataset, the calcul is easy :为此,您需要计算模型通过整个数据集所需的步骤,计算很简单:

steps_per_epoch = len(train_dataset) // batch_size

So with a train_dataset of 60 000 sample and a batch_size of 128, you need to have 468 steps per epoch.因此,如果 train_dataset 为 60 000 个样本,batch_size 为 128,则每个 epoch 需要 468 步。

By setting this parameter like that you make sure that you do not exceed the size of your dataset.通过像这样设置此参数,您可以确保不会超过数据集的大小。

I encountered the same problem and here is what I found.我遇到了同样的问题,这是我发现的。 Documentation of tf.keras.Model.fit : "If x is a tf.data dataset, and 'steps_per_epoch' is None, the epoch will run until the input dataset is exhausted." tf.keras.Model.fit 的文档:“如果 x 是 tf.data 数据集,并且 'steps_per_epoch' 为 None,则 epoch 将运行,直到输入数据集用完为止。”

In other words, we don't need to specify 'steps_per_epoch' if we use the tf.data.dataset as the training data, and tf will figure out how many steps are there.换句话说,我们并不需要,如果我们使用tf.data.dataset作为训练数据来指定“steps_per_epoch”和TF将弄清楚步骤究竟有多少。 Meanwhile, tf will automatically repeat the dataset when the next epoch begins, so you can specify any 'epoch'.同时,tf 会在下一个 epoch 开始时自动重复数据集,因此您可以指定任何 'epoch'。

When passing an infinitely repeating dataset (eg dataset.repeat()), you must specify the steps_per_epoch argument.当传递无限重复的数据集(例如 dataset.repeat())时,您必须指定 steps_per_epoch 参数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM