帶有自定義層的加載 Keras model 與保存的 model 具有不同的權重

Question

我使用 Francois Chollet here提供的模板在Keras中實現了一個 Transformer 編碼器。 在我訓練 model 之后，我使用model.save保存它，但是當我再次加載它進行推理時，我發現權重似乎又是隨機的，因此我的 model 失去了所有推理能力。

我在 Stack Overflow 和 GitHub 上查看了類似的問題，並應用了以下建議，但我仍然遇到同樣的問題：

在 class 上使用@tf.keras.utils.register_keras_serializable()裝飾器。
確保**kwargs在 init 調用中
確保自定義層具有get_config和from_config方法。
使用custom_object_scope加載 model。

下面是一個復制問題的最小可重現示例。 如何更改它以便正確保存 model 權重？

import numpy as np
from tensorflow import keras
import tensorflow as tf
from tensorflow.keras import layers
from keras.models import load_model
from keras.utils import custom_object_scope

@tf.keras.utils.register_keras_serializable()
class TransformerEncoder(layers.Layer):
    def __init__(self, embed_dim, dense_dim, num_heads, **kwargs):
        super().__init__(**kwargs)
        self.embed_dim = embed_dim
        self.dense_dim = dense_dim
        self.num_heads = num_heads
        self.attention = layers.MultiHeadAttention(
            num_heads=num_heads, key_dim=embed_dim)
        self.dense_proj = keras.Sequential(
            [
                layers.Dense(dense_dim, activation="relu"),
                layers.Dense(embed_dim),
            ]
        )
        self.layernorm_1 = layers.LayerNormalization()
        self.layernorm_2 = layers.LayerNormalization()

    def call(self, inputs, mask=None):
        if mask is not None:
            mask = mask[:, tf.newaxis, :]
        attention_output = self.attention(
            inputs, inputs, attention_mask=mask)
        proj_input = self.layernorm_1(inputs + attention_output)
        proj_output = self.dense_proj(proj_input)
        return self.layernorm_2(proj_input + proj_output)

    def get_config(self):
        config = super().get_config()
        config.update({
            "embed_dim": self.embed_dim,
            "num_heads": self.num_heads,
            "dense_dim": self.dense_dim,
        })
        return config

    @classmethod
    def from_config(cls, config):
        return cls(**config)


# Create simple model:
encoder = TransformerEncoder(embed_dim=2, dense_dim=2, num_heads=1)
inputs = keras.Input(shape=(2, 2), batch_size=None, name="test_inputs")
x = encoder(inputs)
x = layers.Flatten()(x)
outputs = layers.Dense(1, activation="linear")(x)
model = keras.Model(inputs, outputs)

# Fit the model and save it:
np.random.seed(42)
X = np.random.rand(10, 2, 2)
y = np.ones(10)
model.compile(optimizer=keras.optimizers.Adam(), loss="mean_squared_error")
model.fit(X, y, epochs=2, batch_size=1)
model.save("./test_model")

# Load the saved model:
with custom_object_scope({
    'TransformerEncoder': TransformerEncoder
}):
    loaded_model = load_model("./test_model")

print(model.weights[0].numpy())
print(loaded_model.weights[0].numpy())

Answer 1

權重被保存（您可以在加載模型后使用load_weights加載它們）。 問題是您在__init__中創建了新層。 您需要從他們的配置中重新創建它們，例如：

class TransformerEncoder(layers.Layer):
    def __init__(self, embed_dim, dense_dim, num_heads, attention_config=None, dense_proj_config=None, **kwargs):
        super().__init__(**kwargs)
        self.embed_dim = embed_dim
        self.dense_dim = dense_dim
        self.num_heads = num_heads
        self.attention = layers.MultiHeadAttention(
            num_heads=num_heads, key_dim=embed_dim) \
            if attention_config is None else layers.MultiHeadAttention.from_config(attention_config)
        self.dense_proj = keras.Sequential(
            [
                layers.Dense(dense_dim, activation="relu"),
                layers.Dense(embed_dim),
            ]
        ) if dense_proj_config is None else keras.Sequential.from_config(dense_proj_config)
        ...

    def call(self, inputs, mask=None):
        ...

    def get_config(self):
        config = super().get_config()
        config.update({
            "embed_dim": self.embed_dim,
            "num_heads": self.num_heads,
            "dense_dim": self.dense_dim,
            "attention_config": self.attention.get_config(),
            "dense_proj_config": self.dense_proj.get_config(),
        })
        return config

Output：

[[[-0.810745   -0.14727005]]

[[ 0.8542909   0.09689581]]]
[[[-0.810745   -0.14727005]]

[[ 0.8542909   0.09689581]]]

Answer 2

秘訣在於它是如何工作的。 您可以嘗試使用 model.get_weights()，但我在 layer.get_weight() 中采樣。 那是因為它很容易被看到。

示例：具有隨機初始值的自定義層會導致少量隨機數在運行幾次時發生變化。

import tensorflow as tf

class MyDenseLayer(tf.keras.layers.Layer):
    def __init__(self, num_outputs):
        super(MyDenseLayer, self).__init__()
        self.num_outputs = num_outputs

    def build(self, input_shape):
        """ initialize weights with randomize numbers """
        min_size_init = tf.keras.initializers.RandomUniform(minval=1, maxval=5, seed=None)
        self.kernel = self.add_weight(shape=[int(input_shape[-1]), self.num_outputs],
        initializer = min_size_init, trainable=True)

    def call(self, inputs):
        return tf.matmul(inputs, self.kernel)


start = 3
limit = 33
delta = 3

# Create DATA
sample = tf.range(start, limit, delta)
sample = tf.cast( sample, dtype=tf.float32 )

# Initail, ( 10, 1 )
sample = tf.constant( sample, shape=( 10, 1 ) )
layer = MyDenseLayer(10)
data = layer(sample)

Output：同一層初始化繼續調用（）過程

### 1st round ###
# [array([[-0.07862139, -0.45416605, -0.53606   ,  0.18597281,  0.2919714 ,
        # -0.27334914,  0.60890776, -0.3856985 ,  0.58052486, -0.5634572 ]], dtype=float32)]

### 2nd round ###
# [array([[ 0.5949032 ,  0.05113244, -0.51997787,  0.26252705, -0.09235346,
        # -0.35243294, -0.0187515 , -0.12527376,  0.22348166,  0.37051445]], dtype=float32)]

### 3rd round ###
# [array([[-0.6654639 , -0.46027896, -0.48666477, -0.23095328,  0.30391783,
         # 0.21867174, -0.5405392 , -0.45399982, -0.22143698,  0.66893476]], dtype=float32)]

示例：每次告訴圖層重置初始值時重新調用。

layer.build([1])
print( data )
print( layer.get_weights() )

Output：model.call() 結果不同，不會繼續。

### 1st round ###
# [array([[ 0.73738164,  0.14095825, -0.5416008 , -0.35084447, -0.35209572,
        # -0.35504425,  0.1692887 ,  0.2611189 ,  0.43355125, -0.3325353 ]], dtype=float32)]

### 2nd round ###
# [array([[ 0.5949032 ,  0.05113244, -0.51997787,  0.26252705, -0.09235346,
        # -0.35243294, -0.0187515 , -0.12527376,  0.22348166,  0.37051445]], dtype=float32)]

### 3rd round ###
# [array([[-0.6654639 , -0.46027896, -0.48666477, -0.23095328,  0.30391783,
         # 0.21867174, -0.5405392 , -0.45399982, -0.22143698,  0.66893476]], dtype=float32)]

示例：我們包括層初始化值要求，假設所有操作都以相同的初始值開始。

""" initialize weights with values ones """
        min_size_init = tf.keras.initializers.Ones()

Output：每次都重現相同的結果。

### 1st round ###
# tf.Tensor(
# [[ 3.  3.  3.  3.  3.  3.  3.  3.  3.  3.]
 # [ 6.  6.  6.  6.  6.  6.  6.  6.  6.  6.]
 # [ 9.  9.  9.  9.  9.  9.  9.  9.  9.  9.]
 # [12. 12. 12. 12. 12. 12. 12. 12. 12. 12.]
 # [15. 15. 15. 15. 15. 15. 15. 15. 15. 15.]
 # [18. 18. 18. 18. 18. 18. 18. 18. 18. 18.]
 # [21. 21. 21. 21. 21. 21. 21. 21. 21. 21.]
 # [24. 24. 24. 24. 24. 24. 24. 24. 24. 24.]
 # [27. 27. 27. 27. 27. 27. 27. 27. 27. 27.]
 # [30. 30. 30. 30. 30. 30. 30. 30. 30. 30.]], shape=(10, 10), dtype=float32)
# [array([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]], dtype=float32)]

### 2nd round ###
# tf.Tensor(
# [[ 3.  3.  3.  3.  3.  3.  3.  3.  3.  3.]
 # [ 6.  6.  6.  6.  6.  6.  6.  6.  6.  6.]
 # [ 9.  9.  9.  9.  9.  9.  9.  9.  9.  9.]
 # [12. 12. 12. 12. 12. 12. 12. 12. 12. 12.]
 # [15. 15. 15. 15. 15. 15. 15. 15. 15. 15.]
 # [18. 18. 18. 18. 18. 18. 18. 18. 18. 18.]
 # [21. 21. 21. 21. 21. 21. 21. 21. 21. 21.]
 # [24. 24. 24. 24. 24. 24. 24. 24. 24. 24.]
 # [27. 27. 27. 27. 27. 27. 27. 27. 27. 27.]
 # [30. 30. 30. 30. 30. 30. 30. 30. 30. 30.]], shape=(10, 10), dtype=float32)
# [array([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]], dtype=float32)]

示例：實施

temp = tf.random.normal([10], 1, 0.2, tf.float32)
temp = np.asarray(temp) * np.asarray([ coefficient_0, coefficient_1, coefficient_2, coefficient_3, coefficient_4, coefficient_5, coefficient_6, coefficient_7, coefficient_8, coefficient_9 ])
temp = tf.nn.softmax(temp)
action = int(np.argmax(temp))

Output：所有變量都是環境變量的協方差。 它選擇映射到游戲中目標動作的 max() 或 min() 值。 我添加了一些隨機值，它不會贏得過濾器乘以操作反饋的價值創建。

帶有自定義層的加載 Keras model 與保存的 model 具有不同的權重

問題描述

2 個解決方案

解決方案1
2 已采納 2022-12-01 06:52:28

解決方案2
-4 2022-12-01 06:13:56

帶有自定義層的加載 Keras model 與保存的 model 具有不同的權重

問題描述

2 個解決方案

解決方案1 2 已采納 2022-12-01 06:52:28

解決方案2 -4 2022-12-01 06:13:56

解決方案1
2 已采納 2022-12-01 06:52:28

解決方案2
-4 2022-12-01 06:13:56