简体   繁体   English

Tensorflow 中的 dropout 层会影响哪些层?

[英]What layers are affected by dropout layer in Tensorflow?

Consider transfer learning in order to use a pretrained model in keras/tensorflow.考虑迁移学习以便在 keras/tensorflow 中使用预训练模型。 For each old layer, trained parameter is set to false so that its weights are not updated during training whereas the last layer(s) have been substituted with new layers and these must be trained.对于每个旧层, trained参数设置为false以便在训练期间不更新其权重,而最后一层已被新层替换,并且必须进行训练。 Particularly two fully connected hidden layers with 512 and 1024 neurons and and relu activation function have been added.特别是增加了两个全连接隐藏层,分别有5121024神经元和 relu 激活函数。 After these layers a Dropout layer is used with rate 0.2 .在这些层之后,使用rate 0.2的 Dropout 层。 This means that during each epoch of training 20% of the neurons are randomly discarded.这意味着在每个训练时期, 20%的神经元被随机丢弃。

What layers does this dropout layer affect?这个 dropout 层会影响哪些层? Does it affect all the network including also the pretrained layers for which layer.trainable=false has been set or does it affect only the newly added layers?它是否会影响所有网络,包括设置了layer.trainable=false的预训练层,还是只影响新添加的层? Or does it affect only the previous layer (ie, the one with 1024 neurons)?还是只影响前一层(即具有1024神经元的层)?

In other words, which layer(s) do the neurons that are turned off during each epoch by the dropout belong to?换句话说,在每个时期被 dropout 关闭的神经元属于哪一层?

import os

from tensorflow.keras import layers
from tensorflow.keras import Model
  
from tensorflow.keras.applications.inception_v3 import InceptionV3

local_weights_file = 'weights.h5'

pre_trained_model = InceptionV3(input_shape = (150, 150, 3), 
                                include_top = False, 
                                weights = None)

pre_trained_model.load_weights(local_weights_file)

for layer in pre_trained_model.layers:
  layer.trainable = False
  
# pre_trained_model.summary()

last_layer = pre_trained_model.get_layer('mixed7')
last_output = last_layer.output

# Flatten the output layer to 1 dimension
x = layers.Flatten()(last_output)
# Add two fully connected layers with 512 and 1,024 hidden units and ReLU activation
x = layers.Dense(512, activation='relu')(x)
x = layers.Dense(1024, activation='relu')(x)
# Add a dropout rate of 0.2
x = layers.Dropout(0.2)(x)                  
# Add a final sigmoid layer for classification
x = layers.Dense  (1, activation='sigmoid')(x)           

model = Model( pre_trained_model.input, x) 

model.compile(optimizer = RMSprop(lr=0.0001), 
              loss = 'binary_crossentropy', 
              metrics = ['accuracy'])

The dropout layer will affect the output of the previous layer. dropout层会影响上一层的输出。

If we look at the specific part of your code:如果我们查看代码的特定部分:

x = layers.Dense(1024, activation='relu')(x)
# Add a dropout rate of 0.2
x = layers.Dropout(0.2)(x)                  
# Add a final sigmoid layer for classification
x = layers.Dense  (1, activation='sigmoid')(x)  

In your case, 20% of the output of the layer defined by x = layers.Dense(1024, activation='relu')(x) will be dropped at random, before being passed to the final Dense layer.在您的情况下,由x = layers.Dense(1024, activation='relu')(x)定义的层的 20% 输出将被随机丢弃,然后再传递到最终的Dense层。

Only the previous layer's neurons are "turned off", but all layers are "affected" in terms of backprop.只有一层的神经元被“关闭”,但所有层在反向传播方面都“受到影响”。

  • Later layers : Dropout's output is input to the next layer, so next layer's outputs will change, and so will next-next's, etc.后层:Dropout的输出是下一层的输入,所以下一层的输出会改变,next-next的输出也会改变,等等。
  • Previous layers : as the "effective output" of the pre-Dropout layer is changed, so will gradients to it, and thus any subsequent gradients.之前的层:随着预 Dropout 层的“有效输出”发生变化,它的梯度也会发生变化,因此任何后续的梯度也会发生变化。 In the extreme case of Dropout(rate=1) , zero gradient will flow.Dropout(rate=1)的极端情况下,零梯度将流动。

Also, note that whole neurons are only dropped if input to Dense is 2D (batch_size, features) ;另外,请注意,只有在 Dense 的输入是 2D (batch_size, features) ,才会删除整个神经元 Dropout applies a random uniform mask to all dimensions (equivalent to dropping whole neurons in 2D case). Dropout 对所有维度应用随机统一掩码(相当于在 2D 情况下删除整个神经元)。 To drop whole neurons, set Dropout(.2, noise_shape=(batch_size, 1, features)) (3D case).要删除整个神经元,请设置Dropout(.2, noise_shape=(batch_size, 1, features)) (3D 情况)。 To drop same neurons across all samples, use noise_shape=(1, 1, features) (or (1, features) for 2D).要在所有样本中删除相同的神经元,请使用noise_shape=(1, 1, features) (或(1, features)用于 2D)。

Dropout technique is not implemented on every single layer within a neural network; Dropout 技术不是在神经网络中的每一层都实现; it's commonly leveraged within the neurons in the last few layers within the network.它通常在网络最后几层的神经元中使用。

The technique works by randomly reducing the number of interconnecting neurons within a neural network.该技术的工作原理是随机减少神经网络中互连神经元的数量。 At every training step, each neuron has a chance of being left out, or rather, dropped out of the collated contribution from connected neurons在每一个训练步骤中,每个神经元都有可能被排除在外,或者更确切地说,从连接的神经元的整理贡献中被剔除

There's some debate as to whether the dropout should be placed before or after the activation function.关于 dropout 应该放在激活函数之前还是之后存在一些争论。 As a rule of thumb, place the dropout after the activate function for all activation functions other than relu .根据经验,对于除relu之外的所有激活函数,将relu activate 函数之后。

you can add dropout after every hidden layer and generally it affect only the previous layer in (your case it will effect (x = layers.Dense(1024, activation='relu')(x) ) ).您可以添加dropout每隐藏层后,一般就只影响在先前层(你的情况下,它会影响(x = layers.Dense(1024, activation='relu')(x) ) In the original paper that proposed dropout layers, by Hinton (2012) , dropout (with p=0.5) was used on each of the fully connected (dense) layers before the output;Hinton (2012)提出的 dropout 层的原始论文中,在输出之前的每个全连接(密集)层上都使用了 dropout(p=0.5); it was not used on the convolutional layers.它没有用于卷积层。 This became the most commonly used configuration.这成为最常用的配置。

I am adding the resources link that might help you:我正在添加可能对您有帮助的资源链接:

https://towardsdatascience.com/understanding-and-implementing-dropout-in-tensorflow-and-keras-a8a3a02c1bfa https://towardsdatascience.com/understanding-and-implementing-dropout-in-tensorflow-and-keras-a8a3a02c1bfa

https://towardsdatascience.com/dropout-on-convolutional-layers-is-weird-5c6ab14f19b2 https://towardsdatascience.com/dropout-on-convolutional-layers-is-weird-5c6ab14f19b2

https://towardsdatascience.com/machine-learning-part-20-dropout-keras-layers-explained-8c9f6dc4c9ab https://towardsdatascience.com/machine-learning-part-20-dropout-keras-layers-explained-8c9f6dc4c9ab

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM