在 Tensorflow 中保存 Model 检查点

Question

I am using Tensorflow 2.3 and trying to save model checkpoint after n number of epochs.我正在使用 Tensorflow 2.3 并尝试在n个时期后保存 model 检查点。 n can be anything but for now trying with 10 n可以是任何东西，但现在尝试使用 10

Per this thread , I tried save_freq = 'epoch' and period = 10 which works but since period parameter is deprecated, I wanted to try an alternative approach.根据这个线程，我尝试save_freq = 'epoch'和period = 10 ，但由于不推荐使用period参数，我想尝试另一种方法。

HEIGHT = 256
WIDTH = 256
CHANNELS = 3
EPOCHS = 100
BATCH_SIZE = 1
SAVE_PERIOD = 10

n_monet_samples = 21

checkpoint_filepath = "./model_checkpoints/cyclegan_checkpoints.{epoch:03d}"
model_checkpoint_callback = callbacks.ModelCheckpoint(
    filepath=checkpoint_filepath,
    save_freq=SAVE_PERIOD * (n_monet_samples//BATCH_SIZE)
)

If I use save_freq=SAVE_PERIOD * (n_m.net_samples//BATCH_SIZE) for the checkpoint callback definition, I get error如果我将save_freq=SAVE_PERIOD * (n_m.net_samples//BATCH_SIZE)用于检查点回调定义，则会出现错误

ValueError: Unrecognized save_freq: 210

I am not sure why since per Keras callback code , as long as save_freq is in epochs or in integer , it should be good.我不确定为什么从Keras 回调代码开始，只要save_freq在 epochs 或integer中，它应该是好的。

Please suggest.请建议。

Answer 1

It does not show any error to me when I tried the same code in same Tensorflow version==2.3 :当我在相同的Tensorflow version==2.3中尝试相同的代码时，它没有向我显示任何错误：

checkpoint_path = "training_1/cp.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)
BATCH_SIZE = 1
SAVE_PERIOD = 10

n_monet_samples = 21

# Create a callback that saves the model's weights
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path,
                                                 save_weights_only=True,
                                                 verbose=1, save_freq=SAVE_PERIOD * (n_monet_samples//BATCH_SIZE))

# Train the model with the new callback
model.fit(train_images, 
          train_labels,  
          epochs=20,
          validation_data=(test_images, test_labels),
          callbacks=[cp_callback])

Output: Output：

Epoch 1/20
32/32 [==============================] - 0s 14ms/step - loss: 1.1152 - sparse_categorical_accuracy: 0.6890 - val_loss: 0.6934 - val_sparse_categorical_accuracy: 0.7940
Epoch 2/20
32/32 [==============================] - 0s 9ms/step - loss: 0.4154 - sparse_categorical_accuracy: 0.8840 - val_loss: 0.5317 - val_sparse_categorical_accuracy: 0.8330
Epoch 3/20
32/32 [==============================] - 0s 8ms/step - loss: 0.2787 - sparse_categorical_accuracy: 0.9270 - val_loss: 0.4854 - val_sparse_categorical_accuracy: 0.8400
Epoch 4/20
32/32 [==============================] - 0s 8ms/step - loss: 0.2230 - sparse_categorical_accuracy: 0.9420 - val_loss: 0.4525 - val_sparse_categorical_accuracy: 0.8590
Epoch 5/20
32/32 [==============================] - 0s 10ms/step - loss: 0.1549 - sparse_categorical_accuracy: 0.9620 - val_loss: 0.4275 - val_sparse_categorical_accuracy: 0.8650
Epoch 6/20
32/32 [==============================] - 0s 10ms/step - loss: 0.1110 - sparse_categorical_accuracy: 0.9770 - val_loss: 0.4251 - val_sparse_categorical_accuracy: 0.8630
Epoch 7/20
11/32 [=========>....................] - ETA: 0s - loss: 0.0936 - sparse_categorical_accuracy: 0.9886
Epoch 00007: saving model to training_1/cp.ckpt
32/32 [==============================] - 0s 14ms/step - loss: 0.0807 - sparse_categorical_accuracy: 0.9840 - val_loss: 0.4248 - val_sparse_categorical_accuracy: 0.8610
Epoch 8/20
32/32 [==============================] - 0s 10ms/step - loss: 0.0612 - sparse_categorical_accuracy: 0.9950 - val_loss: 0.4058 - val_sparse_categorical_accuracy: 0.8650
Epoch 9/20
32/32 [==============================] - 0s 8ms/step - loss: 0.0489 - sparse_categorical_accuracy: 0.9950 - val_loss: 0.4393 - val_sparse_categorical_accuracy: 0.8610
Epoch 10/20
32/32 [==============================] - 0s 6ms/step - loss: 0.0361 - sparse_categorical_accuracy: 1.0000 - val_loss: 0.4150 - val_sparse_categorical_accuracy: 0.8620
Epoch 11/20
32/32 [==============================] - 0s 10ms/step - loss: 0.0294 - sparse_categorical_accuracy: 1.0000 - val_loss: 0.4090 - val_sparse_categorical_accuracy: 0.8670
Epoch 12/20
32/32 [==============================] - 0s 7ms/step - loss: 0.0272 - sparse_categorical_accuracy: 0.9990 - val_loss: 0.4365 - val_sparse_categorical_accuracy: 0.8600
Epoch 13/20
32/32 [==============================] - 0s 8ms/step - loss: 0.0203 - sparse_categorical_accuracy: 1.0000 - val_loss: 0.4231 - val_sparse_categorical_accuracy: 0.8620
Epoch 14/20
 1/32 [..............................] - ETA: 0s - loss: 0.0115 - sparse_categorical_accuracy: 1.0000
Epoch 00014: saving model to training_1/cp.ckpt
32/32 [==============================] - 0s 9ms/step - loss: 0.0164 - sparse_categorical_accuracy: 1.0000 - val_loss: 0.4263 - val_sparse_categorical_accuracy: 0.8650
Epoch 15/20
32/32 [==============================] - 0s 7ms/step - loss: 0.0128 - sparse_categorical_accuracy: 1.0000 - val_loss: 0.4260 - val_sparse_categorical_accuracy: 0.8690
Epoch 16/20
32/32 [==============================] - 0s 7ms/step - loss: 0.0120 - sparse_categorical_accuracy: 1.0000 - val_loss: 0.4194 - val_sparse_categorical_accuracy: 0.8740
Epoch 17/20
32/32 [==============================] - 0s 9ms/step - loss: 0.0110 - sparse_categorical_accuracy: 1.0000 - val_loss: 0.4302 - val_sparse_categorical_accuracy: 0.8710
Epoch 18/20
32/32 [==============================] - 0s 6ms/step - loss: 0.0090 - sparse_categorical_accuracy: 1.0000 - val_loss: 0.4331 - val_sparse_categorical_accuracy: 0.8660
Epoch 19/20
32/32 [==============================] - 0s 7ms/step - loss: 0.0084 - sparse_categorical_accuracy: 1.0000 - val_loss: 0.4320 - val_sparse_categorical_accuracy: 0.8760
Epoch 20/20
16/32 [==============>...............] - ETA: 0s - loss: 0.0074 - sparse_categorical_accuracy: 1.0000
Epoch 00020: saving model to training_1/cp.ckpt
32/32 [==============================] - 0s 13ms/step - loss: 0.0072 - sparse_categorical_accuracy: 1.0000 - val_loss: 0.4280 - val_sparse_categorical_accuracy: 0.8750
<tensorflow.python.keras.callbacks.History at 0x7f90f0082cd0>

As you already know save_freq is equal to 'epoch' or integer.如您所知， save_freq等于“epoch”或 integer。
When using 'epoch' , the callback saves the model after each epoch.使用'epoch'时，回调会在每个纪元后保存 model。
When using integer , the callback saves the model at end of theses many batches(end of these many steps_per_epoch).使用integer时，回调会在这些批次的末尾保存 model（这么多步的末尾_per_epoch）。

As above definition of save_freq, checkpoints saves every after 210 steps.如上面 save_freq 的定义，checkpoints 每隔 210 步保存一次。

Please check this for more details on ModelCheckpoint Arguments.请检查此以获取有关ModelCheckpoint Arguments 的更多详细信息。

在 Tensorflow 中保存 Model 检查点

问题描述

1 个解决方案

解决方案1
0 2022-02-09 16:59:48

在 Tensorflow 中保存 Model 检查点

问题描述

1 个解决方案

解决方案1 0 2022-02-09 16:59:48

解决方案1
0 2022-02-09 16:59:48