简体   繁体   English

在Keras中使用multi_gpu_model恢复培训

[英]Resume training with multi_gpu_model in Keras

I'm training a modified InceptionV3 model with the multi_gpu_model in Keras, and I use model.save to save the whole model. 我正在使用multi_gpu_model中的multi_gpu_model训练修改过的InceptionV3模型,并使用model.save来保存整个模型。

Then I closed and restarted the IDE and used load_model to reinstantiate the model. 然后我关闭并重新启动IDE并使用load_model重新实例化模型。

The problem is that I am not able to resume the training exactly where I left off. 问题是我无法在我离开的地方恢复训练。

Here is the code: 这是代码:

parallel_model = multi_gpu_model(model, gpus=2)

parallel_model.compile(optimizer='rmsprop', loss='categorical_crossentropy')

history = parallel_model.fit_generator(generate_batches(path), steps_per_epoch = num_images/batch_size, epochs = num_epochs)

model.save('my_model.h5')

Before the IDE closed, the loss is around 0.8. 在IDE关闭之前,损失大约为0.8。

After restarting the IDE, reloading the model and re-running the above code, the loss became 1.5. 重新启动IDE后,重新加载模型并重新运行上面的代码,损失变为1.5。

But, according to the Keras FAQ , model_save should save the whole model (architecture + weights + optimizer state), and load_model should return a compiled model that is identical to the previous one. 但是,根据model_save FAQmodel_save应该保存整个模型(架构+权重+优化器状态),而load_model应该返回一个与前一个模型相同的编译模型。

So I don't understand why the loss becomes larger after resuming the training. 所以我不明白为什么在恢复训练后损失会变大。

EDIT: If I don't use the multi_gpu_model and just use the ordinary model, I'm able to resume exactly where I left off. 编辑:如果我不使用multi_gpu_model并只使用普通模型,我就可以恢复我离开的地方。

When you call multi_gpu_model(...) , Keras automatically sets the weights of your model to some default values (at least in the version 2.2.0 which I am currently using). 当您调用multi_gpu_model(...)multi_gpu_model(...)自动将模型的权重设置为某些默认值(至少在我当前使用的2.2.0版本中)。 That's why you were not able to resume the training at the same point as it was when you saved it. 这就是为什么你无法在保存时恢复训练的原因。

I just solved the issue by replacing the weights of the parallel model with the weights from the sequential model: 我刚刚用顺序模型中的权重替换并行模型的权重来解决问题:

parallel_model = multi_gpu_model(model, gpus=2)

parallel_model.layers[-2].set_weights(model.get_weights()) # you can check the index of the sequential model with parallel_model.summary()

parallel_model.compile(optimizer='rmsprop', loss='categorical_crossentropy')

history = parallel_model.fit_generator(generate_batches(path), steps_per_epoch = num_images/batch_size, epochs = num_epochs)

I hope this will help you. 我希望这能帮到您。

@saul19am When you compile it, you can only load the weights and the model structure, but you still lose the optimizer_state. @ saul19am编译时,只能加载权重和模型结构,但仍然会丢失optimizer_state。 I think this can help. 我认为可以提供帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM