[英]Resume training with multi_gpu_model in Keras
I'm training a modified InceptionV3 model with the multi_gpu_model
in Keras, and I use model.save
to save the whole model. 我正在使用
multi_gpu_model
中的multi_gpu_model
训练修改过的InceptionV3模型,并使用model.save
来保存整个模型。
Then I closed and restarted the IDE and used load_model
to reinstantiate the model. 然后我关闭并重新启动IDE并使用
load_model
重新实例化模型。
The problem is that I am not able to resume the training exactly where I left off. 问题是我无法在我离开的地方恢复训练。
Here is the code: 这是代码:
parallel_model = multi_gpu_model(model, gpus=2)
parallel_model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
history = parallel_model.fit_generator(generate_batches(path), steps_per_epoch = num_images/batch_size, epochs = num_epochs)
model.save('my_model.h5')
Before the IDE closed, the loss is around 0.8. 在IDE关闭之前,损失大约为0.8。
After restarting the IDE, reloading the model and re-running the above code, the loss became 1.5. 重新启动IDE后,重新加载模型并重新运行上面的代码,损失变为1.5。
But, according to the Keras FAQ , model_save
should save the whole model (architecture + weights + optimizer state), and load_model
should return a compiled model that is identical to the previous one. 但是,根据
model_save
FAQ , model_save
应该保存整个模型(架构+权重+优化器状态),而load_model
应该返回一个与前一个模型相同的编译模型。
So I don't understand why the loss becomes larger after resuming the training. 所以我不明白为什么在恢复训练后损失会变大。
EDIT: If I don't use the multi_gpu_model
and just use the ordinary model, I'm able to resume exactly where I left off. 编辑:如果我不使用
multi_gpu_model
并只使用普通模型,我就可以恢复我离开的地方。
When you call multi_gpu_model(...)
, Keras automatically sets the weights of your model to some default values (at least in the version 2.2.0 which I am currently using). 当您调用
multi_gpu_model(...)
, multi_gpu_model(...)
自动将模型的权重设置为某些默认值(至少在我当前使用的2.2.0版本中)。 That's why you were not able to resume the training at the same point as it was when you saved it. 这就是为什么你无法在保存时恢复训练的原因。
I just solved the issue by replacing the weights of the parallel model with the weights from the sequential model: 我刚刚用顺序模型中的权重替换并行模型的权重来解决问题:
parallel_model = multi_gpu_model(model, gpus=2)
parallel_model.layers[-2].set_weights(model.get_weights()) # you can check the index of the sequential model with parallel_model.summary()
parallel_model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
history = parallel_model.fit_generator(generate_batches(path), steps_per_epoch = num_images/batch_size, epochs = num_epochs)
I hope this will help you. 我希望这能帮到您。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.