简体   繁体   中英

Convolutional neural network, how the second conv layer works on the first pooling layer

I'm reading material from the TensorFlow website:

https://www.tensorflow.org/tutorials/layers

Suppose we have 10 greyscale monochrome 28x28 pixel images,

  1. If we apply 32 5x5 convolutional filters with 0 padding in the 1st conv layer, we get 10*32*28*28 data.
  2. If We apply 2x2 max pooling with stride 2 in the 1st pooling, we get 10*32*14*14 data.
  3. By now, one image has become a 14*14 size image with 32 channels.

So, if we apply a second convolutional layer(let's say 64 5x5 filters as in the link), do we apply these filters to each channel of each image and get 10*32*64*14*14 data?

Yes and No. You do apply the filters to each channel and each image, but you don't get 10*32*64*14*14 output dimensions. The dimensionality of the output is going to be 10*64*14*14 , because the layer specified 64 output channels per image. In turn, the weights used for this convolution will have size 32*64*5*5 (64 5-by-5 filters for every channel on the input).

No. If you convolve & pad (ignoring the batch size) a 14x14x32 volume with a set of 64 5x5 filters, you'll end up with a 14x14x64 output volume

Every single convolutional filter is convolved along the whole input depth. Thus, your 14x14x32 input volume is convolved with a 5x5 filter and then the output is a 14x14x1 feature map.

Then, the second 5x5 filter of the stack of 64 filters, is convolved again with the input volume. The same operation is done for each one of the 64 filters and the resulting feature maps are stacked, forming your output volume 14x14x64

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM