[英]Extremely slow training speed with Keras when using steps_per_epoch argument
I've noticed tremendous training model speed degradation when I specify steps_per_epoch
argument in model.fit(..)
method. 当我在model.fit(..)
方法中指定steps_per_epoch
参数时,我注意到训练模型的速度会大大降低。 When I specify steps_per_epoch
as None (or don't use it) epoch's ETA is 2 seconds straight: 当我将steps_per_epoch
指定为“无”(或不使用它)时,纪元的预计steps_per_epoch
为2秒:
9120/60000 [===>..........................] - ETA: 2s - loss: 0.7055 - acc: 0.7535 9120/60000 [===> ......................................]-ETA:2秒-损失:0.7055-累计:0.7535
When I add steps_per_epoch
argument, then ETA bumps up to 5 hours and training speed becomes extremely slow: 当我添加steps_per_epoch
参数时,ETA会增加5个小时,并且训练速度变得非常慢:
5/60000 [..............................] - ETA: 5:50:00 - loss: 1.9749 - acc: 0.3437 5/60000 [.....................]-预计到达时间:5:50:00-损失:1.9749-累积: 0.3437
Here is the reproducible script: 这是可复制的脚本:
import tensorflow as tf
from tensorflow import keras
import time
print(tf.__version__)
def get_model():
model = keras.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
return model
(train_images, train_labels), (test_images, test_labels) = keras.datasets.fashion_mnist.load_data()
train_images = train_images / 255.0
model = get_model()
# Very quick - 2 seconds
start = time.time()
model.fit(train_images, train_labels, epochs=1)
end = time.time()
print("{} seconds", end - start)
model = get_model()
# Very slow - 5 hours
start = time.time()
model.fit(train_images, train_labels, epochs=1, steps_per_epoch=len(train_images))
end = time.time()
print("{} seconds", end - start)
I've also tried with pure Keras and the problem persisted. 我也尝试过使用纯Keras,但问题仍然存在。 I use 1.12.0
version of Tensorflow, python 3 and Ubuntu 18.04.1 LTS. 我使用1.12.0
版本,python 3和Ubuntu 18.04.1 LTS。
Why does steps_per_epoch
argument cause such a significant speed degradation and how can I avoid this? 为什么steps_per_epoch
参数会导致速度显着下降,如何避免这种情况?
Thanks! 谢谢!
Notice you're using fit
with an array of data. 请注意,您正在对数据数组进行fit
。 You're not using fit_generator
or using any generator. 您没有使用fit_generator
或任何生成器。
There is no point in using steps_per_epoch
unless you are having unconventional ideas. 除非您有非常规的想法,否则使用steps_per_epoch
是没有意义的。
The default batch size in fit
is 32, this means you're training with 60000 // 32 = 1875
steps per epoch. fit
的默认批处理大小为32,这意味着您正在以每个时间段60000 // 32 = 1875
步进行训练。
If you use this number 1875, you're going to train the same number of batches as the default None
. 如果使用此数字1875,则将训练与默认None
相同数量的批次。 If you use 60000
steps, you're multiplying one epoch by 32. (By the huge difference in your speed, I would say the default batch size is also changed in this case) 如果使用60000
步,则将一个纪元乘以32。(由于速度的巨大差异,在这种情况下,默认的批处理大小也会更改)
The total number shown in the output for fitting without steps is the total number of images. 输出中显示的无级数拟合总数是图像总数。 Notice how the number of completed items grows in multiples of 32. 请注意,已完成项目的数量如何以32的倍数增长。
The total number shown when you use steps is the number of steps. 使用步骤时显示的总数是步骤数。 Notice how the number of completed steps grow 1 by 1. 请注意,已完成的步骤数如何由1增长1。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.