Tensorflow save 和 load_model 不起作用但 save 和 load_weights 起作用

Question

我正在使用 tensorflow 版本 2.8.0：

在過去的 5 年里，我從論壇、githubs 甚至這里的一些來源的多個來源看到了這個問題，但沒有對我有用的明確答案......出於某種原因，在某些情況下，從以前的保存中加載了 model產生與原始 model 評估截然不同的結果。 我還沒有看到任何關於此的有據可查的調查問題，所以我想我會在下面顯示我的完整代碼（問題的簡單說明）。

這是從預訓練的 tensorflow model 遷移學習的應用。model 首先在 train_data 上訓練了 5 個時期，然后微調（具有更多可訓練參數）5 個以上。 在 test_data 上評估 model 顯示准確度為 0.5671。 然后將model以.h5格式保存加載（我也試過tf SavedModel格式，結果一樣）。 生成的 loaded_model 在相同的、未更改的 test_data 上產生的評估精度為 0.4535。

結果應該是相同的（0.5671）...所以為了進一步調查，我決定獨立保存微調模型的權重，在 new_model 中構建和編譯相同的 model 架構，並將保存的模型權重加載到 new_model 中。 評估 new_model 會產生正確的結果，准確度為 0.5671。 ----- 好的，所以一定是權重沒有正確保存，對嗎？ 我從這三個模型（model、loaded_model、new_model）中提取了權重，並比較了它們的扁平化結果。 他們都是一樣的。 我真的不知道這里發生了什么，但我假設它不是隨機初始化，因為 loaded_model 評估結果確實沒有在微調 model 附近執行任何地方 - 我假設它們會收斂得更近。

import tensorflow as tf
tf.random.set_seed(42)
import pandas as pd
import numpy as np

import os
import pathlib
data_dir = pathlib.Path("101_food_classes_10_percent/train")
class_names = np.array(sorted([item.name for item in data_dir.glob('*')]))

train_dir = './101_food_classes_10_percent/train/'
test_dir = './101_food_classes_10_percent/test/'

from tensorflow.keras.preprocessing.image import ImageDataGenerator
datagen=ImageDataGenerator()
train_data = datagen.flow_from_directory(directory = train_dir,
                                         target_size = (224,224),
                                         batch_size = 32,
                                         class_mode='categorical')

test_data = datagen.flow_from_directory(directory = test_dir,
                                        target_size = (224,224),
                                        batch_size = 32,
                                        class_mode='categorical')

from tensorflow.keras.layers.experimental import preprocessing
data_augmentation = tf.keras.Sequential([
    preprocessing.RandomFlip('horizontal'),
    preprocessing.RandomRotation(0.2),
    preprocessing.RandomZoom(0.2),
    preprocessing.RandomHeight(0.2),
    preprocessing.RandomWidth(0.2)
    #preprocessing.Rescaling(1/255.) in EfficientNet it's already scaled but could use this for non-scaled
], name = 'data_augmentation')

Found 7575 images belonging to 101 classes.
Found 25250 images belonging to 101 classes.

# Build headless model - Feature Extraction

# Setup base with frozen layers
base_model = tf.keras.applications.EfficientNetB0(include_top=False)
base_model.trainable=False

inputs = tf.keras.layers.Input(shape = (224,224,3))

x = data_augmentation(inputs)
x = base_model(x, training=False)
x = tf.keras.layers.GlobalAveragePooling2D()(x) # Pool base_model's outputs into a feature vector

outputs = tf.keras.layers.Dense(len(class_names), activation='softmax')(x)

model = tf.keras.Model(inputs,outputs)
model.compile('Adam', 'categorical_crossentropy', metrics=['accuracy'])
model.summary()

Model: "model_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_4 (InputLayer)         [(None, 224, 224, 3)]     0         
_________________________________________________________________
data_augmentation (Sequentia (None, None, None, 3)     0         
_________________________________________________________________
efficientnetb0 (Functional)  (None, None, None, 1280)  4049571   
_________________________________________________________________
global_average_pooling2d_1 ( (None, 1280)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 101)               129381    
=================================================================
Total params: 4,178,952
Trainable params: 129,381
Non-trainable params: 4,049,571
_________________________________________________________________

history = model.fit(train_data, validation_data=test_data,
          validation_steps=int(0.15*len(test_data)),
          epochs=5, callbacks = [checkpoint_callback])

Epoch 1/5
237/237 [==============================] - 63s 230ms/step - loss: 3.4712 - accuracy: 0.2482 - val_loss: 2.4446 - val_accuracy: 0.4497
Epoch 2/5
237/237 [==============================] - 52s 221ms/step - loss: 2.3575 - accuracy: 0.4561 - val_loss: 2.0051 - val_accuracy: 0.5093
Epoch 3/5
237/237 [==============================] - 51s 216ms/step - loss: 1.9838 - accuracy: 0.5265 - val_loss: 1.8313 - val_accuracy: 0.5360
Epoch 4/5
237/237 [==============================] - 51s 212ms/step - loss: 1.7497 - accuracy: 0.5761 - val_loss: 1.7417 - val_accuracy: 0.5461
Epoch 5/5
237/237 [==============================] - 53s 221ms/step - loss: 1.6035 - accuracy: 0.6141 - val_loss: 1.7012 - val_accuracy: 0.5601

model.evaluate(test_data)

790/790 [==============================] - 87s 110ms/step - loss: 1.7294 - accuracy: 0.5481
[1.7294203042984009, 0.5480791926383972]

# Fine tuning: unfreeze some layers, lower leaning rate by 10x
base_model.trainable=True

# Refreeze every layer except last 5, adjust tiner tuned features down the model

for layer in base_model.layers[:-5]:
    layer.trainable=False

# recompile and lower learning rate by 10x

model.compile(tf.keras.optimizers.Adam(learning_rate=0.0001), 'categorical_crossentropy', metrics=['accuracy']) 

model.summary()

Model: "model_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_4 (InputLayer)         [(None, 224, 224, 3)]     0         
_________________________________________________________________
data_augmentation (Sequentia (None, None, None, 3)     0         
_________________________________________________________________
efficientnetb0 (Functional)  (None, None, None, 1280)  4049571   
_________________________________________________________________
global_average_pooling2d_1 ( (None, 1280)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 101)               129381    
=================================================================
Total params: 4,178,952
Trainable params: 910,821
Non-trainable params: 3,268,131
_________________________________________________________________

# Fine Tune for 5 more epochs starting with last epoch left off at:

fine_tune_epochs=10 # Total number of epochs we're after: 5 feature extraction, 5 fine tuning

history_fine_tune = model.fit(train_data,
                              validation_data = test_data,
                              validation_steps=int(0.15*len(test_data)),
                              epochs = fine_tune_epochs,
                              initial_epoch = history.epoch[-1])

Epoch 5/10
237/237 [==============================] - 59s 220ms/step - loss: 1.3571 - accuracy: 0.6543 - val_loss: 1.6403 - val_accuracy: 0.5567
Epoch 6/10
237/237 [==============================] - 51s 213ms/step - loss: 1.2478 - accuracy: 0.6688 - val_loss: 1.6805 - val_accuracy: 0.5596
Epoch 7/10
237/237 [==============================] - 46s 193ms/step - loss: 1.1424 - accuracy: 0.6964 - val_loss: 1.6352 - val_accuracy: 0.5736
Epoch 8/10
237/237 [==============================] - 45s 191ms/step - loss: 1.0902 - accuracy: 0.7065 - val_loss: 1.6494 - val_accuracy: 0.5657
Epoch 9/10
237/237 [==============================] - 46s 193ms/step - loss: 1.0229 - accuracy: 0.7275 - val_loss: 1.6348 - val_accuracy: 0.5633
Epoch 10/10
237/237 [==============================] - 45s 191ms/step - loss: 0.9704 - accuracy: 0.7434 - val_loss: 1.6990 - val_accuracy: 0.5670

model.evaluate(test_data)

790/790 [==============================] - 83s 105ms/step - loss: 1.6578 - accuracy: 0.5671
[1.657836675643921, 0.5670890808105469]

model.save("./101_food_classes_10_percent/big_modelh5")
loaded_model = tf.keras.models.load_model("./101_food_classes_10_percent/big_modelh5.h5")
loaded_model.summary()

Model: "model_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_4 (InputLayer)         [(None, 224, 224, 3)]     0         
_________________________________________________________________
data_augmentation (Sequentia (None, None, None, 3)     0         
_________________________________________________________________
efficientnetb0 (Functional)  (None, None, None, 1280)  4049571   
_________________________________________________________________
global_average_pooling2d_1 ( (None, 1280)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 101)               129381    
=================================================================
Total params: 4,178,952
Trainable params: 910,821
Non-trainable params: 3,268,131
_________________________________________________________________

loaded_model.evaluate(test_data)

790/790 [==============================] - 85s 104ms/step - loss: 2.1780 - accuracy: 0.4535 - loss: 2.1790 - accuracy
[2.1780412197113037, 0.4534653425216675]

# Try save_weights to another model
model.save_weights('my_model_weights.h5')
inputs = tf.keras.layers.Input(shape = (224,224,3))

x = data_augmentation(inputs)
x = base_model(x, training=False)
x = tf.keras.layers.GlobalAveragePooling2D()(x) # Pool base_model's outputs into a feature vector

outputs = tf.keras.layers.Dense(len(class_names), activation='softmax')(x)

new_model = tf.keras.Model(inputs,outputs)
new_model.compile('Adam', 'categorical_crossentropy', metrics=['accuracy'])
new_model.summary()

Model: "model_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_5 (InputLayer)         [(None, 224, 224, 3)]     0         
_________________________________________________________________
data_augmentation (Sequentia (None, None, None, 3)     0         
_________________________________________________________________
efficientnetb0 (Functional)  (None, None, None, 1280)  4049571   
_________________________________________________________________
global_average_pooling2d_2 ( (None, 1280)              0         
_________________________________________________________________
dense_2 (Dense)              (None, 101)               129381    
=================================================================
Total params: 4,178,952
Trainable params: 910,821
Non-trainable params: 3,268,131
_________________________________________________________________

new_model.load_weights('my_model_weights.h5')
# Saving weights works... but not save and load_model
new_model.evaluate(test_data)

790/790 [==============================] - 88s 109ms/step - loss: 1.6578 - accuracy: 0.5671
[1.6578353643417358, 0.5670890808105469]

# Check if weights are the same?
m1 = model.get_weights()
m2 = new_model.get_weights()
m3 = loaded_model.get_weights()

len(m1)==len(m2)==len(m3)

True

from collections.abc import Iterable

def flatten(l):
    for el in l:
        if isinstance(el, Iterable) and not isinstance(el, (str, bytes)):
            yield from flatten(el)
        else:
            yield el

m1 = flatten(m1)
m2 = flatten(m2)
m3 = flatten(m3)

print(list(m1)==list(m2))
print(list(m1)==list(m3))

True
True

Answer 1

這是因為您沒有使用.h5擴展名保存整個 model ，而是使用.h5 saving the weights 。
請檢查以下代碼部分：

model.save("./101_food_classes_10_percent/big_modelh5")  # add .h5
loaded_model = tf.keras.models.load_model("./101_food_classes_10_percent/big_modelh5.h5")
loaded_model.summary()

使用此代碼將整個 model 保存為HDF5文件格式並再次嘗試加載它：

model.save("./101_food_classes_10_percent/big_modelh5.h5")

檢查此以獲取有關以.hdf5格式保存 model 的更多詳細信息。

Tensorflow save 和 load_model 不起作用但 save 和 load_weights 起作用

問題描述

1 個解決方案

解決方案1
0 2022-02-17 15:13:20

Tensorflow save 和 load_model 不起作用但 save 和 load_weights 起作用

問題描述

1 個解決方案

解決方案1 0 2022-02-17 15:13:20

解決方案1
0 2022-02-17 15:13:20