Tensorflow 和 Keras 無法加載 .ckpt 保存

Question

所以我使用 ModelCheckpoint 回調來保存我正在訓練的模型的最佳時期。 它保存時沒有錯誤，但是當我嘗試加載它時，出現錯誤：

2019-07-27 22:58:04.713951: W tensorflow/core/util/tensor_slice_reader.cc:95] Could not open C:\Users\Riley\PycharmProjects\myNN\cp.ckpt: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?

我試過使用絕對/完整路徑，但沒有運氣。 我確定我可以使用 EarlyStopping，但我仍然想了解為什么我會收到錯誤消息。 這是我的代碼：

from __future__ import absolute_import, division, print_function

import tensorflow as tf
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt
import datetime
import statistics

(train_images, train_labels), (test_images, test_labels) = np.load("dataset.npy", allow_pickle=True)

train_images = train_images / 255
test_images = test_images / 255

train_labels = list(map(float, train_labels))
test_labels = list(map(float, test_labels))
train_labels = [i/10 for i in train_labels]
test_labels = [i/10 for i in test_labels]

'''
model = keras.Sequential([
    keras.layers.Flatten(input_shape=(128, 128)),
    keras.layers.Dense(64, activation=tf.nn.relu),
    keras.layers.Dense(1)
  ])

'''

start_time = datetime.datetime.now()

model = keras.Sequential([
    keras.layers.Conv2D(32, kernel_size=(5, 5), strides=(1, 1), activation='relu', input_shape=(128, 128, 1)),
    keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2)),
    keras.layers.Dropout(0.2),
    keras.layers.Conv2D(64, (5, 5), activation='relu'),
    keras.layers.MaxPooling2D(pool_size=(2, 2)),
    keras.layers.Dropout(0.2),
    keras.layers.Flatten(),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(1000, activation='relu'),
    keras.layers.Dense(1)

])

model.compile(loss='mean_absolute_error',
    optimizer=keras.optimizers.SGD(lr=0.01),
    metrics=['mean_absolute_error', 'mean_squared_error'])

train_images = train_images.reshape(328, 128, 128, 1)
test_images = test_images.reshape(82, 128, 128, 1)

model.fit(train_images, train_labels, epochs=100, callbacks=[keras.callbacks.ModelCheckpoint("cp.ckpt", monitor='mean_absolute_error', save_best_only=True, verbose=1)])

model.load_weights("cp.ckpt")

predictions = model.predict(test_images)

totalDifference = 0
for i in range(82):
    print("%s: %s" % (test_labels[i] * 10, predictions[i] * 10))
    totalDifference += abs(test_labels[i] - predictions[i])

avgDifference = totalDifference / 8.2

print("\n%s\n" % avgDifference)
print("Time Elapsed:")
print(datetime.datetime.now() - start_time)

Answer 1

TLDR； 您正在保存整個模型，同時嘗試僅加載權重，這不是它的工作原理。

解釋

您的模型fit ：

model.fit(
    train_images,
    train_labels,
    epochs=100,
    callbacks=[
        keras.callbacks.ModelCheckpoint(
            "cp.ckpt", monitor="mean_absolute_error", save_best_only=True, verbose=1
        )
    ],
)

由於默認情況下在ModelCheckpoint save_weights=False ，您將整個模型保存到.ckpt 。

順便提一句。 文件應命名為.hdf5或.hf5因為它是Hierarchical Data Format 5 。 由於 Windows 與擴展無關，如果tensorflow / keras依賴於該操作系統的擴展，您可能會遇到一些問題。

另一方面，您僅加載模型的權重，而文件包含整個模型：

model.load_weights("cp.ckpt")

Tensorflow的檢查點（ .cp ）機制是從Keras的（不同.hdf5 ），所以，要當心的是（有計划他們更緊密地結合，見這里和這里）。

解決方案

因此，要么像當前一樣使用回調，但使用model.load("model.hdf5")或將save_weights_only=True參數添加到ModelCheckpoint ：

model.fit(
    train_images,
    train_labels,
    epochs=100,
    callbacks=[
        keras.callbacks.ModelCheckpoint(
            "weights.hdf5",
            monitor="mean_absolute_error",
            save_best_only=True,
            verbose=1,
            save_weights_only=True,  # Specify this
        )
    ],
)

你可以使用你的model.load_weights("weights.hdf5") 。

Answer 2

model.load_weights在這里不起作用。 原因在上面的回答中有提到。 您可以通過此代碼加載權重。 首先加載您的模型，然后加載權重。 我希望這段代碼能幫到你

import tensorflow as tf

model=dense_net()
ckpt = tf.train.Checkpoint(
step=tf.Variable(1, dtype=tf.int64),  net=model)
ckpt.restore(tf.train.latest_checkpoint("/kaggle/working/training_1/cp.ckpt.data-00001-of-00002"))

Answer 3

import tensorflow as tf

# Create some variables.
v1 = tf.Variable(tf.random_normal([784, 200], stddev=0.35), name="v1")
v2 = tf.Variable(tf.random_normal([784, 200], stddev=0.35), name="v2")

# Add an op to initialize the variables.
init_op = tf.global_variables_initializer()

# Add ops to save and restore all the variables.
saver = tf.train.Saver()

# Later, launch the model, initialize the variables, do some work, save the
# variables to disk.
with tf.Session() as sess:
  sess.run(init_op)
  # Do some work with the model.

  # Save the variables to disk.
  save_path = saver.save(sess, "/tmp/model.ckpt")
  print("Model saved in file: %s" % save_path)

# Later, launch the model, use the saver to restore variables from disk, and
# do some work with the model.
with tf.Session() as sess:
  # Restore variables from disk.
  saver.restore(sess, "/tmp/model.ckpt")
  print("Model restored.")
  # Do some work with the model

來源

Tensorflow 和 Keras 無法加載 .ckpt 保存

問題描述

3 個解決方案

解決方案1
5 已采納 2019-07-31 22:27:39

解釋

解決方案

解決方案2
2 2020-01-07 03:09:33

解決方案3
0 2019-07-28 03:20:55

Tensorflow 和 Keras 無法加載 .ckpt 保存

問題描述

3 個解決方案

解決方案1 5 已采納 2019-07-31 22:27:39

解釋

解決方案

解決方案2 2 2020-01-07 03:09:33

解決方案3 0 2019-07-28 03:20:55

解決方案1
5 已采納 2019-07-31 22:27:39

解決方案2
2 2020-01-07 03:09:33

解決方案3
0 2019-07-28 03:20:55