简体   繁体   中英

Copying weights from one Conv2D layer to another

Context

I have trained a model on MNIST using Keras. My goal is to print images after the first layer with the first layer being a Conv2D layer. To go about this I'm creating a new model with a single Conv2D layer in which I'll copy the weights from the trained network into the new one.

# Visualization for image ofter first convolution
model_temp = Sequential()
model_temp.add(Conv2D(32, (3, 3),
                         activation='relu', 
                         input_shape=(28,28,1,)))

trained_weights = model.layers[0].get_weights()[0]

model_temp.layers[0].set_weights(trained_weights)

activations = model_temp._predict(X_test)

The variable model holds the trained data from the full network. Also, the input parameters to Conv2D are exactly the same as the ones in the original model.

I have checked the shape of both the weights for model and model_temp and both return as (3, 3, 1, 32) . In theory I should be able to get the weights from the original and input them directly into the set_weights() call on the single Conv2D layer in the new model.

After this convolution, variable named 'activations' would be a tensor that holds 32 (layers), 26 by 26 matrices of output values for each input image.


Error

So when I run this code, I get this error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-152-4ae260f0fe89> in <module>()
      7 trained_weights = model.layers[0].get_weights()[0]
      8 print(trained_weights.shape)
----> 9 model_test = model_test.layers[0].set_weights(trained_weights)
     10 
     11 activations = model_test._predict(X_test[1, 28, 28, 1])

/usr/local/lib/python2.7/dist-packages/keras/engine/topology.pyc in set_weights(self, weights)
   1189                              str(len(params)) +
   1190                              ' weights. Provided weights: ' +
-> 1191                              str(weights)[:50] + '...')
   1192         if not params:
   1193             return

ValueError: You called `set_weights(weights)` on layer "conv2d_60" with a  weight list of length 3, but the layer was expecting 2 weights. Provided weights: [[[[ -6.22274876e-01  -2.18614027e-01   5.29607059...

On the last line, why is set_weights(weights) looking for a length of two instead of three? This error message is slightly cryptic to me so if not a length of two what does "expecting two weights" mean?

Also i'm open to suggestions on an easier way to go about this.


After Further Investigation

After inspecting the source code for get_weights() (line 1168), the error is raised in this section:

 params = self.weights
    if len(params) != len(weights):
        raise ValueError('You called `set_weights(weights)` on layer "' +
                         self.name +
                         '" with a  weight list of length ' +
                         str(len(weights)) +
                         ', but the layer was expecting ' +
                         str(len(params)) +
                         ' weights. Provided weights: ' +
                         str(weights)[:50] + '...')

This condition check determines if the length of what I passed in (the (3, 3, 1, 32) tensor from above) is equivalent to the weights property of this class. So I tested these properties as follows:

# Print contents of weights property
print(model.layers[0].weights)
print(model_test.layers[0].weights)

# Length test of tensors from get_weights call
len_test  = len(model.layers[0].get_weights()[0])
len_test2 = len(model_test.layers[0].get_weights()[0])
print("\nLength get_weights():")
print("Trained Model: ", len_test, "Test Model: ", len_test2)

# Length test of wights attributes from both models
len_test3 = len(model.layers[0].weights)
len_test4 = len(model_test.layers[0].weights)
print("\nLength weights attribute:")
print("Trained Model: ", len_test3, "Test Model: ", len_test4)

Output:

[<tf.Variable 'conv2d_17/kernel:0' shape=(3, 3, 1, 32) dtype=float32_ref>,         <tf.Variable 'conv2d_17/bias:0' shape=(32,) dtype=float32_ref>]
[<tf.Variable 'conv2d_97/kernel:0' shape=(3, 3, 1, 32) dtype=float32_ref>, <tf.Variable 'conv2d_97/bias:0' shape=(32,) dtype=float32_ref>]

Length get_weights():
('Trained Model: ', 3, 'Test Model: ', 3)

Length weights attribute:
('Trained Model: ', 2, 'Test Model: ', 2)

This output makes one hundred percent sense to me as these convolutions in each model are constructed exactly the same. It's also now obvious why it wants a length of two. This is because the weights property is a list of two elements of tf.Variable .

Further investigating this source file, at line 213 we see that weights holds "The concatenation of the lists trainable_weights and non_trainable_weights (in this order)".

So then sure I can grab the weights attribute from the Conv2D layer of the original trained model and pass that in to satisfy this condition but then this condition isn't checking the shape of the passed in data at all. If I do pass in weights from my original model I get a setting an array element with a sequence error from numpy.

Thoughts

I think this is a bug in the source code. I would be awesome if someone could verify this.

You are forgetting about bias vectors. Get_weights() and set_weights() functions for conv2d returns a list with weights matrix as first element and bias vector as second. So the error rightly suggests it expects a list with 2 members. Doing the following should thus work

trained_weights = model.layers[0].get_weights()
model_temp.layers[0].set_weights(trained_weights)

Also if you want to get output from an intermediate layer you dont need to manually transfer weights. Doing something like following is much more convenieant

get_layer_output = K.function([model.input],
                                  [model.layers[0].output])
layer_output = get_layer_output([x])[0]

or

intermediate_layer_model = Model(inputs=model.input,
                                 outputs=model.get_layer(layer_name).output)
intermediate_output = intermediate_layer_model.predict(data)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM