简体   繁体   中英

What is the difference between setting a Keras model trainable vs making each layer trainable

I have a Keras Sequential model consisting of some Dense Layers. I set the trainable property of the whole model to False. But I see that the individual layers have still their trainable property set to True. Do I need to individually set the layers' trainable property also to False? Then what is the meaning of setting trainable property to False on the whole model?

To be able to answer this you need to take a look at the source code of Keras, which you might be surprised after doing so because you would realize that:

As I said, this might be a bit surprising that a Keras model is derived from a Keras layer. But if you think further, you would find it reasonable since they have a lot of common functionalities (eg both get some inputs, do some computations on them, produce some output, and update their internal weights/parameters). One of their common attributes is trainable attribute. Now when you set the trainable property of a model as False it would skip the weight update step. In other words, it does not check the trainable attribute of its underlying layers; rather, first it checks its own trainable attribute (more precisely in Network class) and if it is False the updates are skipped. Therefore, that does not mean its underlying layers have their trainable attribute set to False as well. And there is a good reason for not doing that: a single instance of a layer could be used in multiple models. For example, consider the following two models which have a shared layer:

inp = Input(shape=...)

shared_layer = Dense(...)
sout = shared_layer(inp)

m1_out = Dense(...)(sout)
m2_out = Dense(...)(sout)

model1 = Model(inp, m1_out)
model2 = Model(inp, m2_out)

Now if we set model1.trainable = False , this would freezes the whole model1 (ie training model1 does not update the weights of its underlying layers including shared_layer ); however, the shared_layer and the model2 are still trainable (ie training model2 would update the weights of all its layers including shared_layer ). On the other hand, if we set model1.layers[1].trainable = False , then the shared_layer is freezed and therefore its weights would not be updated when training either model1 or model2 . This way you could have much more control and flexibility, and therefore you can build more complex architectures (eg GANs).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM