简体繁体中英

How to revert keras model to previous epoch weights after train_on_batch nan update

原文 2019-02-24 21:29:47 6 1 python/ tensorflow/ keras/ deep-learning/ nan

I'm having trouble resetting my keras model to the weights it had in the previous epoch after I hit a train_on_batch update that makes some of the weights nans.

I have tried to save the model weights after each training step and then to load the "good" (non-nan) weights back into the keras model after a nan training update. This seems to work fine - when I print the result of model.get_weights() after loading the old weights file into the model, the resulting weights contain no nans (and predict using them also gives a non-nan output).

However, now when I try to train_on_batch again, this time using a new batch, I get a nan update again immediately. I've tried with multiple randomly chosen batches and the nan update happens each time.

Is there something (maybe a parameter) that changes in the model or optimizer configuration when a nan train_on_batch update occurs that needs to be reset for training to continue once I change out the weights?

I would also like to avoid using model.save() and load_model() in the solution.

(keras 2.2.4, tensorflow 1.12.0)

Any thoughts are appreciated!

1 answers

since you have not pasted your code and weights I can't tell you much, but I suspect this problem may be due to dropout or regularisation, if you are using any of the two techniques set the parameters or percentage of dropouts properly as per your network, a high percentage in a small network will lead this sort of problem same with regularization. and for reverting and saving models use checkpoints.

Keras train_on_batch() does not train the model vs fit()

What does train_on_batch() do in keras model?

Why does Keras' train_on_batch produce zero loss and accuracy at the second epoch?

Using Checkpoint saving with train_on_batch in Keras

How to save weights of keras model for each epoch?

How to choose a batch for train_on_batch?

TensorFlow Keras: tf.keras.Model train_on_batch vs make_train_function - Why is one slower than the other?

Problem in sample_weight in Keras when trying train_on_batch for a model with multiple outputs

weights of keras model are nan

How do I use “reduceLROnplateau” callback method with “train_on_batch” in keras

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Keras train_on_batch() does not train the model vs fit() What does train_on_batch() do in keras model? Why does Keras' train_on_batch produce zero loss and accuracy at the second epoch? Using Checkpoint saving with train_on_batch in Keras How to save weights of keras model for each epoch? How to choose a batch for train_on_batch? TensorFlow Keras: tf.keras.Model train_on_batch vs make_train_function - Why is one slower than the other? Problem in sample_weight in Keras when trying train_on_batch for a model with multiple outputs weights of keras model are nan How do I use “reduceLROnplateau” callback method with “train_on_batch” in keras

Related Tags

How to revert keras model to previous epoch weights after train_on_batch nan update

Question

1 answers

solution1 0 2019-02-24 21:33:42

solution1
0 2019-02-24 21:33:42