簡體   English   中英

恢復訓練 tf.keras Tensorboard

[英]Resume Training tf.keras Tensorboard

當我繼續訓練我的模型並在 tensorboard 上可視化進度時,我遇到了一些問題。

Tensorboard 訓練可視化

我的問題是如何在不手動指定任何時期的情況下從同一步驟恢復訓練? 如果可能,只需加載保存的模型,它就可以從保存的優化器中讀取global_step並從那里繼續訓練。

我在下面提供了一些代碼來重現類似的錯誤。

import tensorflow as tf
from tensorflow.keras.callbacks import TensorBoard
from tensorflow.keras.models import load_model

mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(512, activation=tf.nn.relu),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(x_train, y_train, epochs=10, callbacks=[Tensorboard()])
model.save('./final_model.h5', include_optimizer=True)

del model

model = load_model('./final_model.h5')
model.fit(x_train, y_train, epochs=10, callbacks=[Tensorboard()])

您可以使用以下命令運行張量tensorboard

tensorboard --logdir ./logs

您可以將函數model.fit()的參數initial_epoch設置為您希望訓練開始的時期數。 考慮到模型火車,直到指數的划時代epochs達到(而不是迭代次數由下式給出epochs )。 在您的示例中,如果您想再訓練 10 個時期,則應該是:

model.fit(x_train, y_train, initial_epoch=9, epochs=19, callbacks=[Tensorboard()])

它將允許您以正確的方式在 Tensorboard 上可視化您的繪圖。 更多關於這些參數的信息可以在文檔中找到。

這是示例代碼,以防有人需要它。 它實現了 Abhinav Anand 提出的想法:

mca = ModelCheckpoint(join(dir, 'model_{epoch:03d}.h5'),
                      monitor = 'loss',
                      save_best_only = False)
tb = TensorBoard(log_dir = join(dir, 'logs'),
                 write_graph = True,
                 write_images = True)
files = sorted(glob(join(fold_dir, 'model_???.h5')))
if files:
    model_file = files[-1]
    initial_epoch = int(model_file[-6:-3])
    print('Resuming using saved model %s.' % model_file)
    model = load_model(model_file)
else:
    model = nn.model()
    initial_epoch = 0
model.fit(x_train,
          y_train,
          epochs = 100,
          initial_epoch = initial_epoch,
          callbacks = [mca, tb])

nn.model()替換為您自己的用於定義模型的函數。

這很簡單。 在訓練模型時創建檢查點,然后使用這些檢查點從您離開的地方恢復訓練。

import tensorflow as tf
from tensorflow.keras.callbacks import TensorBoard
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.models import load_model

mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(512, activation=tf.nn.relu),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(x_train, y_train, epochs=10, callbacks=[Tensorboard()])
model.save('./final_model.h5', include_optimizer=True)

model = load_model('./final_model.h5')

callbacks = list()

tensorboard = Tensorboard()
callbacks.append(tensorboard)

file_path = "model-{epoch:02d}-{loss:.4f}.hdf5"

# now here you can create checkpoints and save according to your need
# here period is the no of epochs after which to save the model every time during training
# another option is save_weights_only, for your case it should be false
checkpoints = ModelCheckpoint(file_path, monitor='loss', verbose=1, period=1, save_weights_only=False)
callbacks.append(checkpoints)

model.fit(x_train, y_train, epochs=10, callbacks=callbacks)

在此之后,只需從您想再次恢復訓練的位置加載檢查點

model = load_model(checkpoint_of_choice)
model.fit(x_train, y_train, epochs=10, callbacks=callbacks)

你已經完成了。

如果您對此有更多疑問,請告訴我。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM