简体   繁体   中英

Number of trainable parameters in transposed convolutional layers

I'm trying to understand in depth the MuseGAN network that is presented in David Foster's book Generative Deep Learning. (Whch I highly recommend.) One thing I'm stuck on is understanding the number of trainable parameters in the “Temporal Network” which is a sub-network of the generator in the GAN.

Here's the code:

def TemporalNetwork(self):

    input_layer = Input(shape=(self.z_dim,), name='temporal_input')
    x = Reshape([1,1,self.z_dim])(input_layer) 
    x = self.conv_t(x, f=1024, k=(2,1), s=(1,1), a= 'relu', p = 'valid', bn = True)
    x = self.conv_t(x, f=self.z_dim, k=(self.n_bars - 1,1), s=(1,1), a= 'relu', p = 'valid', bn = True)
    output_layer = Reshape([self.n_bars, self.z_dim])(x)

    return Model(input_layer, output_layer)

where self.conv_t is a wapper for Conv2DTranspose. I suppose that the point of it is to combine batch normalization and activation with the transpose convolution layer, so that you can make a simpler call to it when you are building your network. Here's the definition of conv_t:

def conv_t(self, x, f, k, s, a, p, bn):
    x = Conv2DTranspose(
                filters = f
                , kernel_size = k
                , padding = p
                , strides = s
                , kernel_initializer = self.weight_init
                )(x)
    if bn:
        x = BatchNormalization(momentum = 0.9)(x)
    if a == 'relu':
        x = Activation(a)(x)
    elif a == 'lrelu':
        x = LeakyReLU()(x)

The default constants are self.z_dim = 32, self.n_bars=2.

I try to calculate the trainable parameters in transposed convolution layers the same way I do with ordinary convolution layers: input_filters * output_filters * kernel_size + output_filters. I get the following:

input_layer: 0 Reshape: 0 first conv_t: 32 * 1024 * 2 + 1024 = 66560 second conv_t: 1024 * 32 * 1 + 32 = 32800 output_layer: 0

total: 99360

However, in the summary given by gan.generator.summary(), it says there are 103584 parameters.

So, why am I missing 4224 parameters?

Edit 1: Right after asking this question, I realized that the mystery parameters might come from the batch normalization. That is probably the answer.

Yes, batch normalization has both trainable and non-trainable parameters, 2 of each for each filter of the layer that came before, so a total of 4 * filters.

For the first call to conv_t, batch normalization contributes 4 * 1024 = 4096 parameters.

For the second call to conv_t, batch normalization contributes 4 * 32 = 128 parameters.

Together, these constitute the 4224 parameters whose provenance had been a mystery to me.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM