Keras：如何保存 model 并继续训练？

Question

I have a model that I've trained for 40 epochs.我有一个 model，我已经训练了 40 个 epoch。 I kept checkpoints for each epochs, and I have also saved the model with model.save() .我为每个时期保留了检查点，并且还使用model.save()保存了 model 。 The code for training is:训练代码如下：

n_units = 1000
model = Sequential()
model.add(LSTM(n_units, input_shape=(None, vec_size), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units))
model.add(Dropout(0.2))
model.add(Dense(vec_size, activation='linear'))
model.compile(loss='mean_squared_error', optimizer='adam')
# define the checkpoint
filepath="word2vec-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
# fit the model
model.fit(x, y, epochs=40, batch_size=50, callbacks=callbacks_list)

However, when I load the model and try training it again, it starts all over as if it hasn't been trained before.但是，当我加载 model 并再次尝试训练它时，它会重新开始，就好像它以前没有训练过一样。 The loss doesn't start from the last training.损失不是从上次训练开始的。

What confuses me is when I load the model and redefine the model structure and use load_weight , model.predict() works well.让我感到困惑的是，当我加载 model 并重新定义 model 结构并使用load_weight时， model.predict()效果很好。 Thus, I believe the model weights are loaded:因此，我相信加载了 model 权重：

model = Sequential()
model.add(LSTM(n_units, input_shape=(None, vec_size), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units))
model.add(Dropout(0.2))
model.add(Dense(vec_size, activation='linear'))
filename = "word2vec-39-0.0027.hdf5"
model.load_weights(filename)
model.compile(loss='mean_squared_error', optimizer='adam')

However, When I continue training with this, the loss is as high as the initial stage:但是，当我继续训练时，损失与初始阶段一样高：

filepath="word2vec-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
# fit the model
model.fit(x, y, epochs=40, batch_size=50, callbacks=callbacks_list)

I searched and found some examples of saving and loading models here and here .我在此处和此处搜索并找到了一些保存和加载模型的示例。 However, none of them work.但是，它们都不起作用。

Update 1更新 1

I looked at this question , tried it and it works:我看了这个问题，试过了，它有效：

model.save('partly_trained.h5')
del model
load_model('partly_trained.h5')

But when I close Python and reopen it, then run load_model again, it fails.但是当我关闭 Python 并重新打开它，然后再次运行load_model时，它失败了。 The loss is as high as the initial state.损耗与初始 state 一样高。

Update 2更新 2

I tried Yu-Yang's example code and it works.我尝试了 Yu-Yang 的示例代码，它可以工作。 However, when I use my code again, it still failed.但是，当我再次使用我的代码时，它仍然失败。

This is result form the original training.这是原始训练的结果。 The second epoch should start with loss = 3.1***:第二个 epoch 应该从 loss = 3.1*** 开始：

13700/13846 [============================>.] - ETA: 0s - loss: 3.0519
13750/13846 [============================>.] - ETA: 0s - loss: 3.0511
13800/13846 [============================>.] - ETA: 0s - loss: 3.0512Epoch 00000: loss improved from inf to 3.05101, saving model to LPT-00-3.0510.h5

13846/13846 [==============================] - 81s - loss: 3.0510    
Epoch 2/60

   50/13846 [..............................] - ETA: 80s - loss: 3.1754
  100/13846 [..............................] - ETA: 78s - loss: 3.1174
  150/13846 [..............................] - ETA: 78s - loss: 3.0745

I closed Python, reopened it, loaded the model with model = load_model("LPT-00-3.0510.h5") then train with:我关闭 Python，重新打开它，用model = load_model("LPT-00-3.0510.h5")加载 model：

filepath="LPT-{epoch:02d}-{loss:.4f}.h5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
# fit the model
model.fit(x, y, epochs=60, batch_size=50, callbacks=callbacks_list)

The loss starts with 4.54:损失从 4.54 开始：

Epoch 1/60
   50/13846 [..............................] - ETA: 162s - loss: 4.5451
   100/13846 [..............................] - ETA: 113s - loss: 4.3835

Answer 1

I have a model that I've trained for 40 epochs.我有一个训练了40个时代的模型。 I kept checkpoints for each epochs, and I have also saved the model with model.save() .我为每个纪元保留了检查点，并且还使用model.save()保存了模型。 The code for training is:培训代码为：

n_units = 1000
model = Sequential()
model.add(LSTM(n_units, input_shape=(None, vec_size), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units))
model.add(Dropout(0.2))
model.add(Dense(vec_size, activation='linear'))
model.compile(loss='mean_squared_error', optimizer='adam')
# define the checkpoint
filepath="word2vec-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
# fit the model
model.fit(x, y, epochs=40, batch_size=50, callbacks=callbacks_list)

However, when I load the model and try training it again, it starts all over as if it hasn't been trained before.但是，当我加载模型并尝试再次对其进行训练时，它会像以前从未进行过训练一样从头开始。 The loss doesn't start from the last training.损失不是从上一次训练开始的。

What confuses me is when I load the model and redefine the model structure and use load_weight , model.predict() works well.使我感到困惑的是，当我加载模型并重新定义模型结构并使用load_weight ， model.predict()可以很好地工作。 Thus, I believe the model weights are loaded:因此，我相信模型权重已加载：

model = Sequential()
model.add(LSTM(n_units, input_shape=(None, vec_size), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units))
model.add(Dropout(0.2))
model.add(Dense(vec_size, activation='linear'))
filename = "word2vec-39-0.0027.hdf5"
model.load_weights(filename)
model.compile(loss='mean_squared_error', optimizer='adam')

However, When I continue training with this, the loss is as high as the initial stage:但是，当我继续进行此训练时，损失与初始阶段一样高：

filepath="word2vec-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
# fit the model
model.fit(x, y, epochs=40, batch_size=50, callbacks=callbacks_list)

I searched and found some examples of saving and loading models here and here .我在这里和这里搜索并找到了一些保存和加载模型的示例。 However, none of them work.但是，它们都不起作用。

Update 1更新1

I looked at this question , tried it and it works:我看着这个问题，尝试了一下，它起作用了：

model.save('partly_trained.h5')
del model
load_model('partly_trained.h5')

But when I close Python and reopen it, then run load_model again, it fails.但是，当我关闭Python并重新打开它，然后再次运行load_model时，它失败了。 The loss is as high as the initial state.损耗与初始状态一样高。

Update 2更新2

I tried Yu-Yang's example code and it works.我尝试了Yu-Yang的示例代码，它可以工作。 However, when I use my code again, it still failed.但是，当我再次使用我的代码时，它仍然失败。

This is result form the original training.这是原始培训的结果。 The second epoch should start with loss = 3.1***:第二个时期应从损失= 3.1 ***开始：

13700/13846 [============================>.] - ETA: 0s - loss: 3.0519
13750/13846 [============================>.] - ETA: 0s - loss: 3.0511
13800/13846 [============================>.] - ETA: 0s - loss: 3.0512Epoch 00000: loss improved from inf to 3.05101, saving model to LPT-00-3.0510.h5

13846/13846 [==============================] - 81s - loss: 3.0510    
Epoch 2/60

   50/13846 [..............................] - ETA: 80s - loss: 3.1754
  100/13846 [..............................] - ETA: 78s - loss: 3.1174
  150/13846 [..............................] - ETA: 78s - loss: 3.0745

I closed Python, reopened it, loaded the model with model = load_model("LPT-00-3.0510.h5") then train with:我关闭了Python，重新打开它，将模型加载为model model = load_model("LPT-00-3.0510.h5")然后进行以下训练：

filepath="LPT-{epoch:02d}-{loss:.4f}.h5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
# fit the model
model.fit(x, y, epochs=60, batch_size=50, callbacks=callbacks_list)

The loss starts with 4.54:损失始于4.54：

Epoch 1/60
   50/13846 [..............................] - ETA: 162s - loss: 4.5451
   100/13846 [..............................] - ETA: 113s - loss: 4.3835

Answer 2

I have a model that I've trained for 40 epochs.我有一个训练了40个时代的模型。 I kept checkpoints for each epochs, and I have also saved the model with model.save() .我为每个纪元保留了检查点，并且还使用model.save()保存了模型。 The code for training is:培训代码为：

n_units = 1000
model = Sequential()
model.add(LSTM(n_units, input_shape=(None, vec_size), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units))
model.add(Dropout(0.2))
model.add(Dense(vec_size, activation='linear'))
model.compile(loss='mean_squared_error', optimizer='adam')
# define the checkpoint
filepath="word2vec-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
# fit the model
model.fit(x, y, epochs=40, batch_size=50, callbacks=callbacks_list)

However, when I load the model and try training it again, it starts all over as if it hasn't been trained before.但是，当我加载模型并尝试再次对其进行训练时，它会像以前从未进行过训练一样从头开始。 The loss doesn't start from the last training.损失不是从上一次训练开始的。

What confuses me is when I load the model and redefine the model structure and use load_weight , model.predict() works well.使我感到困惑的是，当我加载模型并重新定义模型结构并使用load_weight ， model.predict()可以很好地工作。 Thus, I believe the model weights are loaded:因此，我相信模型权重已加载：

model = Sequential()
model.add(LSTM(n_units, input_shape=(None, vec_size), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units))
model.add(Dropout(0.2))
model.add(Dense(vec_size, activation='linear'))
filename = "word2vec-39-0.0027.hdf5"
model.load_weights(filename)
model.compile(loss='mean_squared_error', optimizer='adam')

However, When I continue training with this, the loss is as high as the initial stage:但是，当我继续进行此训练时，损失与初始阶段一样高：

filepath="word2vec-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
# fit the model
model.fit(x, y, epochs=40, batch_size=50, callbacks=callbacks_list)

I searched and found some examples of saving and loading models here and here .我在这里和这里搜索并找到了一些保存和加载模型的示例。 However, none of them work.但是，它们都不起作用。

Update 1更新1

I looked at this question , tried it and it works:我看着这个问题，尝试了一下，它起作用了：

model.save('partly_trained.h5')
del model
load_model('partly_trained.h5')

But when I close Python and reopen it, then run load_model again, it fails.但是，当我关闭Python并重新打开它，然后再次运行load_model时，它失败了。 The loss is as high as the initial state.损耗与初始状态一样高。

Update 2更新2

I tried Yu-Yang's example code and it works.我尝试了Yu-Yang的示例代码，它可以工作。 However, when I use my code again, it still failed.但是，当我再次使用我的代码时，它仍然失败。

This is result form the original training.这是原始培训的结果。 The second epoch should start with loss = 3.1***:第二个时期应从损失= 3.1 ***开始：

13700/13846 [============================>.] - ETA: 0s - loss: 3.0519
13750/13846 [============================>.] - ETA: 0s - loss: 3.0511
13800/13846 [============================>.] - ETA: 0s - loss: 3.0512Epoch 00000: loss improved from inf to 3.05101, saving model to LPT-00-3.0510.h5

13846/13846 [==============================] - 81s - loss: 3.0510    
Epoch 2/60

   50/13846 [..............................] - ETA: 80s - loss: 3.1754
  100/13846 [..............................] - ETA: 78s - loss: 3.1174
  150/13846 [..............................] - ETA: 78s - loss: 3.0745

I closed Python, reopened it, loaded the model with model = load_model("LPT-00-3.0510.h5") then train with:我关闭了Python，重新打开它，将模型加载为model model = load_model("LPT-00-3.0510.h5")然后进行以下训练：

filepath="LPT-{epoch:02d}-{loss:.4f}.h5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
# fit the model
model.fit(x, y, epochs=60, batch_size=50, callbacks=callbacks_list)

The loss starts with 4.54:损失始于4.54：

Epoch 1/60
   50/13846 [..............................] - ETA: 162s - loss: 4.5451
   100/13846 [..............................] - ETA: 113s - loss: 4.3835

Answer 3

I have a model that I've trained for 40 epochs.我有一个训练了40个时代的模型。 I kept checkpoints for each epochs, and I have also saved the model with model.save() .我为每个纪元保留了检查点，并且还使用model.save()保存了模型。 The code for training is:培训代码为：

n_units = 1000
model = Sequential()
model.add(LSTM(n_units, input_shape=(None, vec_size), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units))
model.add(Dropout(0.2))
model.add(Dense(vec_size, activation='linear'))
model.compile(loss='mean_squared_error', optimizer='adam')
# define the checkpoint
filepath="word2vec-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
# fit the model
model.fit(x, y, epochs=40, batch_size=50, callbacks=callbacks_list)

However, when I load the model and try training it again, it starts all over as if it hasn't been trained before.但是，当我加载模型并尝试再次对其进行训练时，它会像以前从未进行过训练一样从头开始。 The loss doesn't start from the last training.损失不是从上一次训练开始的。

What confuses me is when I load the model and redefine the model structure and use load_weight , model.predict() works well.使我感到困惑的是，当我加载模型并重新定义模型结构并使用load_weight ， model.predict()可以很好地工作。 Thus, I believe the model weights are loaded:因此，我相信模型权重已加载：

model = Sequential()
model.add(LSTM(n_units, input_shape=(None, vec_size), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units))
model.add(Dropout(0.2))
model.add(Dense(vec_size, activation='linear'))
filename = "word2vec-39-0.0027.hdf5"
model.load_weights(filename)
model.compile(loss='mean_squared_error', optimizer='adam')

However, When I continue training with this, the loss is as high as the initial stage:但是，当我继续进行此训练时，损失与初始阶段一样高：

filepath="word2vec-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
# fit the model
model.fit(x, y, epochs=40, batch_size=50, callbacks=callbacks_list)

I searched and found some examples of saving and loading models here and here .我在这里和这里搜索并找到了一些保存和加载模型的示例。 However, none of them work.但是，它们都不起作用。

Update 1更新1

I looked at this question , tried it and it works:我看着这个问题，尝试了一下，它起作用了：

model.save('partly_trained.h5')
del model
load_model('partly_trained.h5')

But when I close Python and reopen it, then run load_model again, it fails.但是，当我关闭Python并重新打开它，然后再次运行load_model时，它失败了。 The loss is as high as the initial state.损耗与初始状态一样高。

Update 2更新2

I tried Yu-Yang's example code and it works.我尝试了Yu-Yang的示例代码，它可以工作。 However, when I use my code again, it still failed.但是，当我再次使用我的代码时，它仍然失败。

This is result form the original training.这是原始培训的结果。 The second epoch should start with loss = 3.1***:第二个时期应从损失= 3.1 ***开始：

13700/13846 [============================>.] - ETA: 0s - loss: 3.0519
13750/13846 [============================>.] - ETA: 0s - loss: 3.0511
13800/13846 [============================>.] - ETA: 0s - loss: 3.0512Epoch 00000: loss improved from inf to 3.05101, saving model to LPT-00-3.0510.h5

13846/13846 [==============================] - 81s - loss: 3.0510    
Epoch 2/60

   50/13846 [..............................] - ETA: 80s - loss: 3.1754
  100/13846 [..............................] - ETA: 78s - loss: 3.1174
  150/13846 [..............................] - ETA: 78s - loss: 3.0745

I closed Python, reopened it, loaded the model with model = load_model("LPT-00-3.0510.h5") then train with:我关闭了Python，重新打开它，将模型加载为model model = load_model("LPT-00-3.0510.h5")然后进行以下训练：

filepath="LPT-{epoch:02d}-{loss:.4f}.h5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
# fit the model
model.fit(x, y, epochs=60, batch_size=50, callbacks=callbacks_list)

The loss starts with 4.54:损失始于4.54：

Epoch 1/60
   50/13846 [..............................] - ETA: 162s - loss: 4.5451
   100/13846 [..............................] - ETA: 113s - loss: 4.3835

Answer 4

I have a model that I've trained for 40 epochs.我有一个训练了40个时代的模型。 I kept checkpoints for each epochs, and I have also saved the model with model.save() .我为每个纪元保留了检查点，并且还使用model.save()保存了模型。 The code for training is:培训代码为：

n_units = 1000
model = Sequential()
model.add(LSTM(n_units, input_shape=(None, vec_size), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units))
model.add(Dropout(0.2))
model.add(Dense(vec_size, activation='linear'))
model.compile(loss='mean_squared_error', optimizer='adam')
# define the checkpoint
filepath="word2vec-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
# fit the model
model.fit(x, y, epochs=40, batch_size=50, callbacks=callbacks_list)

However, when I load the model and try training it again, it starts all over as if it hasn't been trained before.但是，当我加载模型并尝试再次对其进行训练时，它会像以前从未进行过训练一样从头开始。 The loss doesn't start from the last training.损失不是从上一次训练开始的。

What confuses me is when I load the model and redefine the model structure and use load_weight , model.predict() works well.使我感到困惑的是，当我加载模型并重新定义模型结构并使用load_weight ， model.predict()可以很好地工作。 Thus, I believe the model weights are loaded:因此，我相信模型权重已加载：

model = Sequential()
model.add(LSTM(n_units, input_shape=(None, vec_size), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units))
model.add(Dropout(0.2))
model.add(Dense(vec_size, activation='linear'))
filename = "word2vec-39-0.0027.hdf5"
model.load_weights(filename)
model.compile(loss='mean_squared_error', optimizer='adam')

However, When I continue training with this, the loss is as high as the initial stage:但是，当我继续进行此训练时，损失与初始阶段一样高：

filepath="word2vec-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
# fit the model
model.fit(x, y, epochs=40, batch_size=50, callbacks=callbacks_list)

I searched and found some examples of saving and loading models here and here .我在这里和这里搜索并找到了一些保存和加载模型的示例。 However, none of them work.但是，它们都不起作用。

Update 1更新1

I looked at this question , tried it and it works:我看着这个问题，尝试了一下，它起作用了：

model.save('partly_trained.h5')
del model
load_model('partly_trained.h5')

But when I close Python and reopen it, then run load_model again, it fails.但是，当我关闭Python并重新打开它，然后再次运行load_model时，它失败了。 The loss is as high as the initial state.损耗与初始状态一样高。

Update 2更新2

I tried Yu-Yang's example code and it works.我尝试了Yu-Yang的示例代码，它可以工作。 However, when I use my code again, it still failed.但是，当我再次使用我的代码时，它仍然失败。

This is result form the original training.这是原始培训的结果。 The second epoch should start with loss = 3.1***:第二个时期应从损失= 3.1 ***开始：

13700/13846 [============================>.] - ETA: 0s - loss: 3.0519
13750/13846 [============================>.] - ETA: 0s - loss: 3.0511
13800/13846 [============================>.] - ETA: 0s - loss: 3.0512Epoch 00000: loss improved from inf to 3.05101, saving model to LPT-00-3.0510.h5

13846/13846 [==============================] - 81s - loss: 3.0510    
Epoch 2/60

   50/13846 [..............................] - ETA: 80s - loss: 3.1754
  100/13846 [..............................] - ETA: 78s - loss: 3.1174
  150/13846 [..............................] - ETA: 78s - loss: 3.0745

I closed Python, reopened it, loaded the model with model = load_model("LPT-00-3.0510.h5") then train with:我关闭了Python，重新打开它，将模型加载为model model = load_model("LPT-00-3.0510.h5")然后进行以下训练：

filepath="LPT-{epoch:02d}-{loss:.4f}.h5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
# fit the model
model.fit(x, y, epochs=60, batch_size=50, callbacks=callbacks_list)

The loss starts with 4.54:损失始于4.54：

Epoch 1/60
   50/13846 [..............................] - ETA: 162s - loss: 4.5451
   100/13846 [..............................] - ETA: 113s - loss: 4.3835

Answer 5

I have a model that I've trained for 40 epochs.我有一个训练了40个时代的模型。 I kept checkpoints for each epochs, and I have also saved the model with model.save() .我为每个纪元保留了检查点，并且还使用model.save()保存了模型。 The code for training is:培训代码为：

n_units = 1000
model = Sequential()
model.add(LSTM(n_units, input_shape=(None, vec_size), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units))
model.add(Dropout(0.2))
model.add(Dense(vec_size, activation='linear'))
model.compile(loss='mean_squared_error', optimizer='adam')
# define the checkpoint
filepath="word2vec-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
# fit the model
model.fit(x, y, epochs=40, batch_size=50, callbacks=callbacks_list)

However, when I load the model and try training it again, it starts all over as if it hasn't been trained before.但是，当我加载模型并尝试再次对其进行训练时，它会像以前从未进行过训练一样从头开始。 The loss doesn't start from the last training.损失不是从上一次训练开始的。

What confuses me is when I load the model and redefine the model structure and use load_weight , model.predict() works well.使我感到困惑的是，当我加载模型并重新定义模型结构并使用load_weight ， model.predict()可以很好地工作。 Thus, I believe the model weights are loaded:因此，我相信模型权重已加载：

model = Sequential()
model.add(LSTM(n_units, input_shape=(None, vec_size), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units))
model.add(Dropout(0.2))
model.add(Dense(vec_size, activation='linear'))
filename = "word2vec-39-0.0027.hdf5"
model.load_weights(filename)
model.compile(loss='mean_squared_error', optimizer='adam')

However, When I continue training with this, the loss is as high as the initial stage:但是，当我继续进行此训练时，损失与初始阶段一样高：

filepath="word2vec-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
# fit the model
model.fit(x, y, epochs=40, batch_size=50, callbacks=callbacks_list)

I searched and found some examples of saving and loading models here and here .我在这里和这里搜索并找到了一些保存和加载模型的示例。 However, none of them work.但是，它们都不起作用。

Update 1更新1

I looked at this question , tried it and it works:我看着这个问题，尝试了一下，它起作用了：

model.save('partly_trained.h5')
del model
load_model('partly_trained.h5')

But when I close Python and reopen it, then run load_model again, it fails.但是，当我关闭Python并重新打开它，然后再次运行load_model时，它失败了。 The loss is as high as the initial state.损耗与初始状态一样高。

Update 2更新2

I tried Yu-Yang's example code and it works.我尝试了Yu-Yang的示例代码，它可以工作。 However, when I use my code again, it still failed.但是，当我再次使用我的代码时，它仍然失败。

This is result form the original training.这是原始培训的结果。 The second epoch should start with loss = 3.1***:第二个时期应从损失= 3.1 ***开始：

13700/13846 [============================>.] - ETA: 0s - loss: 3.0519
13750/13846 [============================>.] - ETA: 0s - loss: 3.0511
13800/13846 [============================>.] - ETA: 0s - loss: 3.0512Epoch 00000: loss improved from inf to 3.05101, saving model to LPT-00-3.0510.h5

13846/13846 [==============================] - 81s - loss: 3.0510    
Epoch 2/60

   50/13846 [..............................] - ETA: 80s - loss: 3.1754
  100/13846 [..............................] - ETA: 78s - loss: 3.1174
  150/13846 [..............................] - ETA: 78s - loss: 3.0745

I closed Python, reopened it, loaded the model with model = load_model("LPT-00-3.0510.h5") then train with:我关闭了Python，重新打开它，将模型加载为model model = load_model("LPT-00-3.0510.h5")然后进行以下训练：

filepath="LPT-{epoch:02d}-{loss:.4f}.h5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
# fit the model
model.fit(x, y, epochs=60, batch_size=50, callbacks=callbacks_list)

The loss starts with 4.54:损失始于4.54：

Epoch 1/60
   50/13846 [..............................] - ETA: 162s - loss: 4.5451
   100/13846 [..............................] - ETA: 113s - loss: 4.3835

Answer 6

I have a model that I've trained for 40 epochs.我有一个训练了40个时代的模型。 I kept checkpoints for each epochs, and I have also saved the model with model.save() .我为每个纪元保留了检查点，并且还使用model.save()保存了模型。 The code for training is:培训代码为：

n_units = 1000
model = Sequential()
model.add(LSTM(n_units, input_shape=(None, vec_size), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units))
model.add(Dropout(0.2))
model.add(Dense(vec_size, activation='linear'))
model.compile(loss='mean_squared_error', optimizer='adam')
# define the checkpoint
filepath="word2vec-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
# fit the model
model.fit(x, y, epochs=40, batch_size=50, callbacks=callbacks_list)

However, when I load the model and try training it again, it starts all over as if it hasn't been trained before.但是，当我加载模型并尝试再次对其进行训练时，它会像以前从未进行过训练一样从头开始。 The loss doesn't start from the last training.损失不是从上一次训练开始的。

What confuses me is when I load the model and redefine the model structure and use load_weight , model.predict() works well.使我感到困惑的是，当我加载模型并重新定义模型结构并使用load_weight ， model.predict()可以很好地工作。 Thus, I believe the model weights are loaded:因此，我相信模型权重已加载：

model = Sequential()
model.add(LSTM(n_units, input_shape=(None, vec_size), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units))
model.add(Dropout(0.2))
model.add(Dense(vec_size, activation='linear'))
filename = "word2vec-39-0.0027.hdf5"
model.load_weights(filename)
model.compile(loss='mean_squared_error', optimizer='adam')

However, When I continue training with this, the loss is as high as the initial stage:但是，当我继续进行此训练时，损失与初始阶段一样高：

filepath="word2vec-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
# fit the model
model.fit(x, y, epochs=40, batch_size=50, callbacks=callbacks_list)

I searched and found some examples of saving and loading models here and here .我在这里和这里搜索并找到了一些保存和加载模型的示例。 However, none of them work.但是，它们都不起作用。

Update 1更新1

I looked at this question , tried it and it works:我看着这个问题，尝试了一下，它起作用了：

model.save('partly_trained.h5')
del model
load_model('partly_trained.h5')

But when I close Python and reopen it, then run load_model again, it fails.但是，当我关闭Python并重新打开它，然后再次运行load_model时，它失败了。 The loss is as high as the initial state.损耗与初始状态一样高。

Update 2更新2

I tried Yu-Yang's example code and it works.我尝试了Yu-Yang的示例代码，它可以工作。 However, when I use my code again, it still failed.但是，当我再次使用我的代码时，它仍然失败。

This is result form the original training.这是原始培训的结果。 The second epoch should start with loss = 3.1***:第二个时期应从损失= 3.1 ***开始：

13700/13846 [============================>.] - ETA: 0s - loss: 3.0519
13750/13846 [============================>.] - ETA: 0s - loss: 3.0511
13800/13846 [============================>.] - ETA: 0s - loss: 3.0512Epoch 00000: loss improved from inf to 3.05101, saving model to LPT-00-3.0510.h5

13846/13846 [==============================] - 81s - loss: 3.0510    
Epoch 2/60

   50/13846 [..............................] - ETA: 80s - loss: 3.1754
  100/13846 [..............................] - ETA: 78s - loss: 3.1174
  150/13846 [..............................] - ETA: 78s - loss: 3.0745

I closed Python, reopened it, loaded the model with model = load_model("LPT-00-3.0510.h5") then train with:我关闭了Python，重新打开它，将模型加载为model model = load_model("LPT-00-3.0510.h5")然后进行以下训练：

filepath="LPT-{epoch:02d}-{loss:.4f}.h5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
# fit the model
model.fit(x, y, epochs=60, batch_size=50, callbacks=callbacks_list)

The loss starts with 4.54:损失始于4.54：

Epoch 1/60
   50/13846 [..............................] - ETA: 162s - loss: 4.5451
   100/13846 [..............................] - ETA: 113s - loss: 4.3835

Answer 7

I have a model that I've trained for 40 epochs.我有一个训练了40个时代的模型。 I kept checkpoints for each epochs, and I have also saved the model with model.save() .我为每个纪元保留了检查点，并且还使用model.save()保存了模型。 The code for training is:培训代码为：

n_units = 1000
model = Sequential()
model.add(LSTM(n_units, input_shape=(None, vec_size), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units))
model.add(Dropout(0.2))
model.add(Dense(vec_size, activation='linear'))
model.compile(loss='mean_squared_error', optimizer='adam')
# define the checkpoint
filepath="word2vec-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
# fit the model
model.fit(x, y, epochs=40, batch_size=50, callbacks=callbacks_list)

However, when I load the model and try training it again, it starts all over as if it hasn't been trained before.但是，当我加载模型并尝试再次对其进行训练时，它会像以前从未进行过训练一样从头开始。 The loss doesn't start from the last training.损失不是从上一次训练开始的。

What confuses me is when I load the model and redefine the model structure and use load_weight , model.predict() works well.使我感到困惑的是，当我加载模型并重新定义模型结构并使用load_weight ， model.predict()可以很好地工作。 Thus, I believe the model weights are loaded:因此，我相信模型权重已加载：

model = Sequential()
model.add(LSTM(n_units, input_shape=(None, vec_size), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(n_units))
model.add(Dropout(0.2))
model.add(Dense(vec_size, activation='linear'))
filename = "word2vec-39-0.0027.hdf5"
model.load_weights(filename)
model.compile(loss='mean_squared_error', optimizer='adam')

However, When I continue training with this, the loss is as high as the initial stage:但是，当我继续进行此训练时，损失与初始阶段一样高：

filepath="word2vec-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
# fit the model
model.fit(x, y, epochs=40, batch_size=50, callbacks=callbacks_list)

I searched and found some examples of saving and loading models here and here .我在这里和这里搜索并找到了一些保存和加载模型的示例。 However, none of them work.但是，它们都不起作用。

Update 1更新1

I looked at this question , tried it and it works:我看着这个问题，尝试了一下，它起作用了：

model.save('partly_trained.h5')
del model
load_model('partly_trained.h5')

But when I close Python and reopen it, then run load_model again, it fails.但是，当我关闭Python并重新打开它，然后再次运行load_model时，它失败了。 The loss is as high as the initial state.损耗与初始状态一样高。

Update 2更新2

I tried Yu-Yang's example code and it works.我尝试了Yu-Yang的示例代码，它可以工作。 However, when I use my code again, it still failed.但是，当我再次使用我的代码时，它仍然失败。

This is result form the original training.这是原始培训的结果。 The second epoch should start with loss = 3.1***:第二个时期应从损失= 3.1 ***开始：

13700/13846 [============================>.] - ETA: 0s - loss: 3.0519
13750/13846 [============================>.] - ETA: 0s - loss: 3.0511
13800/13846 [============================>.] - ETA: 0s - loss: 3.0512Epoch 00000: loss improved from inf to 3.05101, saving model to LPT-00-3.0510.h5

13846/13846 [==============================] - 81s - loss: 3.0510    
Epoch 2/60

   50/13846 [..............................] - ETA: 80s - loss: 3.1754
  100/13846 [..............................] - ETA: 78s - loss: 3.1174
  150/13846 [..............................] - ETA: 78s - loss: 3.0745

I closed Python, reopened it, loaded the model with model = load_model("LPT-00-3.0510.h5") then train with:我关闭了Python，重新打开它，将模型加载为model model = load_model("LPT-00-3.0510.h5")然后进行以下训练：

filepath="LPT-{epoch:02d}-{loss:.4f}.h5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
# fit the model
model.fit(x, y, epochs=60, batch_size=50, callbacks=callbacks_list)

The loss starts with 4.54:损失始于4.54：

Epoch 1/60
   50/13846 [..............................] - ETA: 162s - loss: 4.5451
   100/13846 [..............................] - ETA: 113s - loss: 4.3835

Answer 8

Since Keras and Tensorflow are now bundled, you can use the newer Tensorflow format that will save all model info including the optimizer and its state (from the doc , emphasis mine): Since Keras and Tensorflow are now bundled, you can use the newer Tensorflow format that will save all model info including the optimizer and its state (from the doc , emphasis mine):

You can save an entire model to a single artifact.您可以将整个 model 保存到单个工件中。 It will include:它将包括：

The model's architecture/config模型的架构/配置

The model's weight values (which were learned during training)模型的权重值（在训练期间学习）

The model's compilation information (if compile() was called)模型的编译信息（如果调用了 compile()）

The optimizer and its state, if any (this enables you to restart training where you left)优化器及其 state（如果有）（这使您可以在离开的地方重新开始训练）

APIs蜜蜂

model.save() or tf.keras.models.save_model() model.save()或tf.keras.models.save_model()

tf.keras.models.load_model() tf.keras.models.load_model()

So once your model is saved that way, you can load it and resume training: it will continue where it left off.因此，一旦您的 model 以这种方式保存，您就可以加载它并继续训练：它将从中断的地方继续。

Keras：如何保存 model 并继续训练？

问题描述

8 个解决方案

解决方案1
51 已采纳 2017-08-01 02:53:21

解决方案2
4 2017-08-02 05:03:14

解决方案3
3 2020-03-06 15:58:02

解决方案4
1 2020-01-05 13:15:34

解决方案5
0 2018-04-27 06:05:38

解决方案6
0 2019-02-18 01:27:19

解决方案7
0 2019-08-08 20:03:06

解决方案8
0 2022-08-23 09:02:08

Keras：如何保存 model 并继续训练？

问题描述

8 个解决方案

解决方案1 51 已采纳 2017-08-01 02:53:21

解决方案2 4 2017-08-02 05:03:14

解决方案3 3 2020-03-06 15:58:02

解决方案4 1 2020-01-05 13:15:34

解决方案5 0 2018-04-27 06:05:38

解决方案6 0 2019-02-18 01:27:19

解决方案7 0 2019-08-08 20:03:06

解决方案8 0 2022-08-23 09:02:08

解决方案1
51 已采纳 2017-08-01 02:53:21

解决方案2
4 2017-08-02 05:03:14

解决方案3
3 2020-03-06 15:58:02

解决方案4
1 2020-01-05 13:15:34

解决方案5
0 2018-04-27 06:05:38

解决方案6
0 2019-02-18 01:27:19

解决方案7
0 2019-08-08 20:03:06

解决方案8
0 2022-08-23 09:02:08