使用steps_per_epoch参数时，Keras的训练速度极慢

Question

I've noticed tremendous training model speed degradation when I specify steps_per_epoch argument in model.fit(..) method. 当我在model.fit(..)方法中指定steps_per_epoch参数时，我注意到训练模型的速度会大大降低。 When I specify steps_per_epoch as None (or don't use it) epoch's ETA is 2 seconds straight: 当我将steps_per_epoch指定为“无”（或不使用它）时，纪元的预计steps_per_epoch为2秒：

9120/60000 [===>..........................] - ETA: 2s - loss: 0.7055 - acc: 0.7535 9120/60000 [===> ......................................]-ETA：2秒-损失：0.7055-累计：0.7535

When I add steps_per_epoch argument, then ETA bumps up to 5 hours and training speed becomes extremely slow: 当我添加steps_per_epoch参数时，ETA会增加5个小时，并且训练速度变得非常慢：

5/60000 [..............................] - ETA: 5:50:00 - loss: 1.9749 - acc: 0.3437 5/60000 [.....................]-预计到达时间：5：50：00-损失：1.9749-累积： 0.3437

Here is the reproducible script: 这是可复制的脚本：

import tensorflow as tf
from tensorflow import keras
import time

print(tf.__version__)


def get_model():
    model = keras.Sequential([
        keras.layers.Flatten(input_shape=(28, 28)),
        keras.layers.Dense(128, activation='relu'),
        keras.layers.Dense(10, activation='softmax')
    ])
    model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])
    return model


(train_images, train_labels), (test_images, test_labels) = keras.datasets.fashion_mnist.load_data()
train_images = train_images / 255.0

model = get_model()

# Very quick - 2 seconds
start = time.time()
model.fit(train_images, train_labels, epochs=1)
end = time.time()
print("{} seconds", end - start)

model = get_model()

# Very slow - 5 hours
start = time.time()
model.fit(train_images, train_labels, epochs=1, steps_per_epoch=len(train_images))
end = time.time()
print("{} seconds", end - start)

I've also tried with pure Keras and the problem persisted. 我也尝试过使用纯Keras，但问题仍然存在。 I use 1.12.0 version of Tensorflow, python 3 and Ubuntu 18.04.1 LTS. 我使用1.12.0版本，python 3和Ubuntu 18.04.1 LTS。

Why does steps_per_epoch argument cause such a significant speed degradation and how can I avoid this? 为什么steps_per_epoch参数会导致速度显着下降，如何避免这种情况？

Thanks! 谢谢！

Answer 1

Notice you're using fit with an array of data. 请注意，您正在对数据数组进行fit 。 You're not using fit_generator or using any generator. 您没有使用fit_generator或任何生成器。

There is no point in using steps_per_epoch unless you are having unconventional ideas. 除非您有非常规的想法，否则使用steps_per_epoch是没有意义的。

The default batch size in fit is 32, this means you're training with 60000 // 32 = 1875 steps per epoch. fit的默认批处理大小为32，这意味着您正在以每个时间段60000 // 32 = 1875步进行训练。

If you use this number 1875, you're going to train the same number of batches as the default None . 如果使用此数字1875，则将训练与默认None相同数量的批次。 If you use 60000 steps, you're multiplying one epoch by 32. (By the huge difference in your speed, I would say the default batch size is also changed in this case) 如果使用60000步，则将一个纪元乘以32。（由于速度的巨大差异，在这种情况下，默认的批处理大小也会更改）

The total number shown in the output for fitting without steps is the total number of images. 输出中显示的无级数拟合总数是图像总数。 Notice how the number of completed items grows in multiples of 32. 请注意，已完成项目的数量如何以32的倍数增长。

The total number shown when you use steps is the number of steps. 使用步骤时显示的总数是步骤数。 Notice how the number of completed steps grow 1 by 1. 请注意，已完成的步骤数如何由1增长1。

使用steps_per_epoch参数时，Keras的训练速度极慢

问题描述

1 个解决方案

解决方案1
5 已采纳 2019-02-11 17:23:11

使用steps_per_epoch参数时，Keras的训练速度极慢

问题描述

1 个解决方案

解决方案1 5 已采纳 2019-02-11 17:23:11

解决方案1
5 已采纳 2019-02-11 17:23:11