自定义损失函数显着减慢了多 GPU 模型的训练

Question

I'm training variational autoencoders on protein structures using Keras' multi_gpu_model.我正在使用 Keras 的 multi_gpu_model 训练蛋白质结构的变分自动编码器。 When switching from normal AEs to VAEs, my model takes >3x longer per epoch to train.当从普通 AE 切换到 VAE 时，我的模型每个 epoch 需要超过 3 倍的时间来训练。

I identified the problem to be the loss function, changing it back to a built-in mse results in the same speed as seen before.我发现问题出在损失函数上，将其改回内置的 mse 结果与之前看到的速度相同。

I'm using more or less the same vae_loss implementation as seen in many tutorials:我正在使用或多或少与许多教程中看到的相同的 vae_loss 实现：

def vae_loss(y_true, y_pred):
    reconstruction_loss = recon_loss(y_true, y_pred)
    kl_loss = beta * K.mean(
             1 + K.flatten(z_log_var) - K.square(K.flatten(z_mean)) - K.exp(K.flatten(z_log_var)), axis=-1)
    kl_loss /= kl_loss_scaling # divide kl_loss by size of output dimension
    total_loss = K.mean(reconstruction_loss + kl_loss)
    return total_loss

When monitoring the GPU usage I realized, that they are well utilized and then drop to zero utilization after each epoch.在监控 GPU 使用情况时，我意识到它们得到了很好的利用，然后在每个 epoch 后使用率下降到零。 The batchsize is adjusted for the number of GPUs and the exact same setup but with mse as loss works fine.批量大小根据 GPU 数量和完全相同的设置进行调整，但使用 mse 作为损失工作正常。 It seems like the GPUs are waiting for the loss to be computed and hence have a considerable amount of downtime.似乎 GPU 正在等待计算损失，因此有相当长的停机时间。 (The effect is more strongly noticeable with smaller batchsizes, so increasing this parameter is somewhat of a solution, but I think this is far from optimal). （对于较小的批次大小，效果更明显，因此增加此参数在某种程度上是一种解决方案，但我认为这远非最佳）。

Is this unavoidable since this loss is more expensive to compute or is there something I can adjust to achieve better performance?这是不可避免的，因为这种损失的计算成本更高，还是我可以调整以获得更好的性能？

Answer 1

The root-cause might be the supports of custom loss function in Keras.根本原因可能是 Keras 中自定义损失函数的支持。 If you use predefined loss in Keras, it works just fine.如果你在 Keras 中使用预定义的损失，它工作得很好。 One thing you can try is to rewrite loss function to a Lambda layer and change model to be multi outputs, one is you original output and one is model loss.您可以尝试的一件事是将损失函数重写为 Lambda 层并将模型更改为多输出，一个是您的原始输出，一个是模型损失。

自定义损失函数显着减慢了多 GPU 模型的训练

问题描述

1 个解决方案

解决方案1
0 2019-12-08 08:50:22

自定义损失函数显着减慢了多 GPU 模型的训练

问题描述

1 个解决方案

解决方案1 0 2019-12-08 08:50:22

解决方案1
0 2019-12-08 08:50:22