简体   繁体   中英

How is the second Conv2D layer being calculated?

This code I've got from Udacity tutorial " Intoduction to Deep learning with TensorFlow ":

model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, (3,3), padding='same', activation=tf.nn.relu,
                           input_shape=(28, 28, 1)),
    tf.keras.layers.MaxPooling2D((2, 2), strides=2),
    tf.keras.layers.Conv2D(64, (3,3), padding='same', activation=tf.nn.relu),
    tf.keras.layers.MaxPooling2D((2, 2), strides=2),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation=tf.nn.relu),
    tf.keras.layers.Dense(10,  activation=tf.nn.softmax)
])

What I can not understand is how the second Conv2D being calculated after the first MaxPooling2D -layer.

Let's assume that we are processing 28x28px image. First Conv2D -layer returns (28, 28, 32) shape where 32 equals the number of filters being applied. (3,3) is the kernel size. The results than are being sent to MaxPooling2D -layer which reduces the size of an image from (28, 28, 32) to (14, 14, 32). Am I right here?

Now we have a shape (14, 14, 32) and send it to the second Conv2D -layer which will apply 64 filters using (3,3) kernel.

How is the process of applying (3,3) kernel with 64 filters will look like on our (14, 14, 32) data? Will the second Conv2D -layer create (14, 14, 2048) output shape or not? Or the second Conv2D -layer will create (14, 14, 32)x64 different blocks each for one of 64 applied filters?

I have searched all over the internet to find visually how it works to understand better this process with no luck.

Thanks!

You can always view the architecture of a neural network with model.summary() method. The model in question has an architecture as follows:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 28, 28, 32)        320       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 14, 14, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 14, 14, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 7, 7, 64)          0         
_________________________________________________________________
flatten (Flatten)            (None, 3136)              0         
_________________________________________________________________
dense (Dense)                (None, 128)               401536    
_________________________________________________________________
dense_1 (Dense)              (None, 10)                1290      
=================================================================

Since padding='same' for conv2d_1 , dim will remain 14 x 14 . Number of channels is equal to number of filters applied to this layer. Hence, the output dimension of 2nd conv layer will be 14 x 14 x 64 .

EDIT: There's wonderful resource shared by @avin in the comments below. I am adding that as part of the answer, just so that it is not lost in the comments. Thank you, @avin!

http://cs231n.github.io/convolutional-networks/ provides a visual explanation of CNN.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM