在Keras中使用multi_gpu_model恢复培训

Question

I'm training a modified InceptionV3 model with the multi_gpu_model in Keras, and I use model.save to save the whole model. 我正在使用multi_gpu_model中的multi_gpu_model训练修改过的InceptionV3模型，并使用model.save来保存整个模型。

Then I closed and restarted the IDE and used load_model to reinstantiate the model. 然后我关闭并重新启动IDE并使用load_model重新实例化模型。

The problem is that I am not able to resume the training exactly where I left off. 问题是我无法在我离开的地方恢复训练。

Here is the code: 这是代码：

parallel_model = multi_gpu_model(model, gpus=2)

parallel_model.compile(optimizer='rmsprop', loss='categorical_crossentropy')

history = parallel_model.fit_generator(generate_batches(path), steps_per_epoch = num_images/batch_size, epochs = num_epochs)

model.save('my_model.h5')

Before the IDE closed, the loss is around 0.8. 在IDE关闭之前，损失大约为0.8。

After restarting the IDE, reloading the model and re-running the above code, the loss became 1.5. 重新启动IDE后，重新加载模型并重新运行上面的代码，损失变为1.5。

But, according to the Keras FAQ , model_save should save the whole model (architecture + weights + optimizer state), and load_model should return a compiled model that is identical to the previous one. 但是，根据model_save FAQ ， model_save应该保存整个模型（架构+权重+优化器状态），而load_model应该返回一个与前一个模型相同的编译模型。

So I don't understand why the loss becomes larger after resuming the training. 所以我不明白为什么在恢复训练后损失会变大。

EDIT: If I don't use the multi_gpu_model and just use the ordinary model, I'm able to resume exactly where I left off. 编辑：如果我不使用multi_gpu_model并只使用普通模型，我就可以恢复我离开的地方。

Answer 1

When you call multi_gpu_model(...) , Keras automatically sets the weights of your model to some default values (at least in the version 2.2.0 which I am currently using). 当您调用multi_gpu_model(...) ， multi_gpu_model(...)自动将模型的权重设置为某些默认值（至少在我当前使用的2.2.0版本中）。 That's why you were not able to resume the training at the same point as it was when you saved it. 这就是为什么你无法在保存时恢复训练的原因。

I just solved the issue by replacing the weights of the parallel model with the weights from the sequential model: 我刚刚用顺序模型中的权重替换并行模型的权重来解决问题：

parallel_model = multi_gpu_model(model, gpus=2)

parallel_model.layers[-2].set_weights(model.get_weights()) # you can check the index of the sequential model with parallel_model.summary()

parallel_model.compile(optimizer='rmsprop', loss='categorical_crossentropy')

history = parallel_model.fit_generator(generate_batches(path), steps_per_epoch = num_images/batch_size, epochs = num_epochs)

I hope this will help you. 我希望这能帮到您。

Answer 2

@saul19am When you compile it, you can only load the weights and the model structure, but you still lose the optimizer_state. @ saul19am编译时，只能加载权重和模型结构，但仍然会丢失optimizer_state。 I think this can help. 我认为这可以提供帮助。

在Keras中使用multi_gpu_model恢复培训

问题描述

2 个解决方案

解决方案1
1 2018-07-17 13:26:42

解决方案2
0 2019-01-04 08:03:39

在Keras中使用multi_gpu_model恢复培训

问题描述

2 个解决方案

解决方案1 1 2018-07-17 13:26:42

解决方案2 0 2019-01-04 08:03:39

解决方案1
1 2018-07-17 13:26:42

解决方案2
0 2019-01-04 08:03:39