使用 model.fit_generator 时，keras val 非常慢

Question

When I use my dataset to turn base on Resnet-50 in Keras(backend is tensorflow), I find it very odd that when after each epoch, val is slower than train.当我使用我的数据集在 Keras 中基于 Resnet-50（后端是 tensorflow）时，我发现很奇怪的是，在每个 epoch 之后，val 都比 train 慢。 I don't know why, is it because my GPU do not have enough memory?不知道为什么，是不是我的GPU内存不够？ My GPU is K2200, which has 4 GB memory.我的 GPU 是 K2200，它有 4 GB 内存。 Am I misunderstanding the paras' meaning ?我误解了paras的意思吗？

I have 35946 train pic so I use:我有 35946 张火车照片，所以我使用：

samples_per_epoch=35946,

I have 8986 val pic so I use：我有 8986 val pic 所以我使用：

 nb_val_samples=8986,

The following is part of my code:以下是我的代码的一部分：

train_datagen = ImageDataGenerator(
    rescale=1./255,
    featurewise_center=False,  # set input mean to 0 over the dataset
    samplewise_center=False,  # set each sample mean to 0
    featurewise_std_normalization=False,  # divide inputs by std of the dataset
    samplewise_std_normalization=False,  # divide each input by its std
    zca_whitening=False,  # apply ZCA whitening
    rotation_range=20,  # randomly rotate images in the range (degrees, 0 to 180)
    width_shift_range=0.1,  # randomly shift images horizontally (fraction of total width)
    height_shift_range=0.1,  # randomly shift images vertically (fraction of total height)
    horizontal_flip=True,  # randomly flip images
    vertical_flip=False,
    zoom_range=0.1,
    channel_shift_range=0.,
    fill_mode='nearest',
    cval=0.,

)
test_datagen = ImageDataGenerator(rescale=1. / 255)

train_generator = train_datagen.flow_from_directory(
    'data/train',
    batch_size=batch_size,
    class_mode='categorical')

validation_generator = test_datagen.flow_from_directory(
    'data/val',
    batch_size=batch_size,
    class_mode='categorical')
model.fit_generator(train_generator,
                    # steps_per_epoch=X_train.shape[0] // batch_size,
                    samples_per_epoch=35946,
                    epochs=epochs,
                    validation_data=validation_generator,
                    verbose=1,
                    nb_val_samples=8986,
                    callbacks=[earlyStopping,saveBestModel,tensorboard])

Answer 1

@Yanning As you have mentioned in your comment itself the first epoch is slow because the ImageDataGenerator is reading data from disk to RAM. @Yanning 正如您在评论中提到的，第一个纪元很慢，因为 ImageDataGenerator 正在将数据从磁盘读取到 RAM。 This part is very slow.这部分非常慢。 Once the data has been read into RAM it just the matter of reading and transferring data from RAM to GPU.将数据读入 RAM 后，只需从 RAM 读取数据并将其传输到 GPU。

Therefore if your dataset is not huge and can fit into your RAM, you can try to make a single numpy file out of all the dataset and read this data in the beginning.因此，如果您的数据集不是很大并且可以放入您的 RAM，您可以尝试从所有数据集中制作一个 numpy 文件并在开始时读取这些数据。 This will save a lot of disk seek time.这将节省大量磁盘寻道时间。

Please checkout this post to get some comparison between time taken for different operations:请查看这篇文章以比较不同操作所花费的时间：

Latency Numbers Every Programmer Should Know每个程序员都应该知道的延迟数字

Latency Comparison Numbers延迟比较数字

Main memory reference                         100 ns
Read 1 MB sequentially from memory        250,000 ns 
Read 1 MB sequentially from SSD         1,000,000 ns
Read 1 MB sequentially from disk       20,000,000 ns

Answer 2

I think the answer lies in the various choice of arguments for "fit_generator" function.我认为答案在于“fit_generator”函数的各种参数选择。 I was having same issue, and have fixed that by using following arguments in the "fit_generator" function.我遇到了同样的问题，并通过在“fit_generator”函数中使用以下参数来解决这个问题。

steps_per_epoch=training_samples_count // batch_size,
validation_steps=validation_samples_count // batch_size,

Note that I have specified steps for both validation and training, and this makes validation blazing fast.请注意，我已经为验证和训练指定了步骤，这使得验证速度非常快。

使用 model.fit_generator 时，keras val 非常慢

问题描述

2 个解决方案

解决方案1
1 2017-12-03 01:44:10

Latency Comparison Numbers延迟比较数字

解决方案2
0 2020-07-12 14:19:22

使用 model.fit_generator 时，keras val 非常慢

问题描述

2 个解决方案

解决方案1 1 2017-12-03 01:44:10

Latency Comparison Numbers延迟比较数字

解决方案2 0 2020-07-12 14:19:22

解决方案1
1 2017-12-03 01:44:10

解决方案2
0 2020-07-12 14:19:22