简体   繁体   中英

Training process of a CNN (Python Keras)

Consider the following architecture for a CNN, ( code fragment was referred from this link )

model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))

My questions are basically about the training process of a CNN.

  1. When you train the model, do the outputs of the Flatten layer change during the epochs?
  2. If the outputs (of Flatten layer) change, does that mean there is a backpropagation process before the Flatten layer (between, Conv2d->Conv2D->MaxPooling2D->Flatten) as well?
  3. What is the necessity of using a Dropout after the MaxPooling2D layer (or any layer before flatten)?
  1. The flatten layer simply takes the output of the previous layer and flattens everything into one long vector instead of keeping it as a multidimensional array. So the flatten layer itself doesn't have any weights to learn, and the way that it calculates its output never changes. It's actual output does change while you train because the preceding layers are being trained, and so their outputs are changing, and thus the input to flatten is changing.

  2. There is nothing unique about the flatten layer that would prevent backpropagation being applied to the layers before it. If there was, that would prevent the preceding layers from being trained. In order to train layers prior to the flatten there has to be backpropagation. Backpropagation is the process that is used to update the weights in the network. If it was never applied to the beginning layers they would never be updated, and they would never learn anything.

  3. Dropout layers are used for their regularizing effect to reduce overfitting. By randomly selecting some neurons to be deactivated on any given run, dropout attempts to force the network to learn more independent, robust features. It can't rely on a small subset of neurons because they may not be used. The same idea applies both before and after the flatten layer.

Whether or not including dropout at specific points in your network will be useful depends on your particular use case. For example, if you aren't struggling with your network overfitting, then dropout may not help improve your results. Often deciding exactly when to use dropout and how much to use is a matter of experimentation to see what works for your data.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM