简体   繁体   中英

How to revert keras model to previous epoch weights after train_on_batch nan update

I'm having trouble resetting my keras model to the weights it had in the previous epoch after I hit a train_on_batch update that makes some of the weights nans.

I have tried to save the model weights after each training step and then to load the "good" (non-nan) weights back into the keras model after a nan training update. This seems to work fine - when I print the result of model.get_weights() after loading the old weights file into the model, the resulting weights contain no nans (and predict using them also gives a non-nan output).

However, now when I try to train_on_batch again, this time using a new batch, I get a nan update again immediately. I've tried with multiple randomly chosen batches and the nan update happens each time.

Is there something (maybe a parameter) that changes in the model or optimizer configuration when a nan train_on_batch update occurs that needs to be reset for training to continue once I change out the weights?

I would also like to avoid using model.save() and load_model() in the solution.

(keras 2.2.4, tensorflow 1.12.0)

Any thoughts are appreciated!

since you have not pasted your code and weights I can't tell you much, but I suspect this problem may be due to dropout or regularisation, if you are using any of the two techniques set the parameters or percentage of dropouts properly as per your network, a high percentage in a small network will lead this sort of problem same with regularization. and for reverting and saving models use checkpoints.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM