简体   繁体   中英

How to share layer weights in custom Keras model function

I would like to share weights between two sides of a siamese model.

Given two input sets, each should pass through the exact same model function with the same weights (siamese part). Then both outputs are concatenated together as output.

I've been through how to share specific layers in the documentation ( https://keras.io/getting-started/functional-api-guide/#shared-layers ) and other questions on this board. It works.

But when I create my own model function of multiple layers, Keras will not share weights.

Here is a minimal example:

from keras.layers import Input, Dense, concatenate
from keras.models import Model

# Define inputs
input_a = Input(shape=(16,), dtype='float32')
input_b = Input(shape=(16,), dtype='float32')

# My simple model
def my_model(x):
    x = Dense(128, input_shape=(x.shape[1],), activation='relu')(x)
    x = Dense(128, activation='relu')(x)
    return x

# Instantiate model parameters to share
processed_a = my_model(input_a)
processed_b = my_model(input_b)

# Concatenate output vector
final_output = concatenate([processed_a, processed_b], axis=-1)

model = Model(inputs=[input_a, input_b], outputs=final_output)

This model, if shared, should have a total of (16*128 + 128) + (128*128 + 128) parameters = 18688 parameters. If we check this:

model.summary()

This shows us that we have double:

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_3 (InputLayer)            (None, 16)           0                                            
__________________________________________________________________________________________________
input_4 (InputLayer)            (None, 16)           0                                            
__________________________________________________________________________________________________
dense_5 (Dense)                 (None, 128)          2176        input_3[0][0]                    
__________________________________________________________________________________________________
dense_7 (Dense)                 (None, 128)          2176        input_4[0][0]                    
__________________________________________________________________________________________________
dense_6 (Dense)                 (None, 128)          16512       dense_5[0][0]                    
__________________________________________________________________________________________________
dense_8 (Dense)                 (None, 128)          16512       dense_7[0][0]                    
__________________________________________________________________________________________________
concatenate_2 (Concatenate)     (None, 256)          0           dense_6[0][0]                    
                                                                 dense_8[0][0]                    
==================================================================================================
Total params: 37,376
Trainable params: 37,376
Non-trainable params: 0
__________________________________________________________________________________________________

I'm not sure what I did wrong. This is a simplified example. My example first loads a pretrained language model and encodes/processes the text input into vectors, then applies this siamese model. Because of the pretrained model, it is preferred to have the model in a separate function like this.

Thanks.

The issue is that when you call my_model you are creating entirely new layers (ie you are initing a Dense layer each time). what you want to do is only init each layer once. that would look something like:

from keras.layers import Input, Dense, concatenate
from keras.models import Model

# Define inputs
input_a = Input(shape=(16,), dtype='float32')
input_b = Input(shape=(16,), dtype='float32')

# Instantiate model parameters to share
layer1 = Dense(128, input_shape=(input_a.shape[1],), activation='relu')
layer2 = Dense(128, activation='relu')
processed_a = layer2(layer1(input_a))
processed_b = layer2(layer1(input_b))

# Concatenate output vector
final_output = concatenate([processed_a, processed_b], axis=-1)

model = Model(inputs=[input_a, input_b], outputs=final_output)

Now model.summary() gives:

Model: "model_2"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_5 (InputLayer)            (None, 16)           0                                            
__________________________________________________________________________________________________
input_6 (InputLayer)            (None, 16)           0                                            
__________________________________________________________________________________________________
dense_5 (Dense)                 (None, 128)          2176        input_5[0][0]                    
                                                                 input_6[0][0]                    
__________________________________________________________________________________________________
dense_6 (Dense)                 (None, 128)          16512       dense_5[0][0]                    
                                                                 dense_5[1][0]                    
__________________________________________________________________________________________________
concatenate_2 (Concatenate)     (None, 256)          0           dense_6[0][0]                    
                                                                 dense_6[1][0]                    
==================================================================================================
Total params: 18,688
Trainable params: 18,688
Non-trainable params: 0

EDIT: if you want to create the layers just once inside a function, something like the below should work

# Instantiate model parameters to share
def my_model(x):
    return Sequential([Dense(128, input_shape=(x.shape[1],), activation='relu'),
                      Dense(128, activation='relu')])
# create sequential model (and layers) only once
model = my_model(input_a)
processed_a = model(input_a)
processed_b = model(input_b)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM