简体繁体 English

Keras培训陷入LSTM

[英]Keras training gets stuck in LSTM

原文 2018-09-24 17:50:38 6 1 python/ tensorflow/ keras/ deep-learning

I'm trying to run a LSTM model in Keras but get stuck in the training part. 我正在尝试在Keras中运行LSTM模型，但被困在培训部分。

For each epoch, it takes around 3-4 seconds for the model to train the steps to 49x/500, then the model will get stuck. 对于每个时期，模型将步数训练到49x / 500大约需要3-4秒，然后模型将被卡住。 After like 7xx seconds the training will resume and complete the remaining few steps and finish ONE epoch. 在大约7xx秒后，训练将恢复并完成剩余的几个步骤，并完成一个纪元。

Then it loops again trains very fast then freezes. 然后它再次循环训练非常快，然后冻结。

What is the possible reason? 可能是什么原因？

The code I run is the coding example P.213 from the book Deep Learning with Python by Francois Chollet. 我运行的代码是Francois Chollet撰写的《用Python进行深度学习》一书中的编码示例P.213。 If the code/my hardware have a problem, the training process for each epoch should be constantly slow? 如果代码/我的硬件有问题，每个时期的训练过程应该一直很慢吗？ Now it trains very fast at the beginning but gets stuck at the end for each epoch. 现在， 它在开始时训练非常快，但是在每个时期都被卡住了。

I have tried update GPU driver, conda update --all, assign another GPU to run the model (I have 2 GPU). 我尝试过更新GPU驱动程序，conda update --all，分配另一个GPU来运行模型（我有2个GPU）。

I'm sure my GPU are fine because I have no problem running other models. 我确定我的GPU很好，因为运行其他模型没有问题。

1 个解决方案

Ths is normal, at the end of each epoch Keras will use your validation data to compute validation loss and metrics, and this of course takes time, maybe somehow your validation set is bigger than your training set? 这很正常，在每个纪元末，Keras都会使用您的验证数据来计算验证损失和指标，这当然需要时间，也许您的验证集比您的训练集还大？

It looks like it freezes but it is indeed computing on the validation set, nothing to worry about. 看起来好像死机了，但实际上是在验证集上计算的，无需担心。