[英]Convolutional neural network, how the second conv layer works on the first pooling layer
I'm reading material from the TensorFlow website: 我正在阅读TensorFlow网站上的资料:
https://www.tensorflow.org/tutorials/layers https://www.tensorflow.org/tutorials/layers
Suppose we have 10 greyscale monochrome 28x28 pixel images, 假设我们有10个灰度单色28x28像素图像,
So, if we apply a second convolutional layer(let's say 64 5x5 filters as in the link), do we apply these filters to each channel of each image and get 10*32*64*14*14 data? 因此,如果我们应用第二个卷积层(比如链接中的64个5x5滤波器),我们是否将这些滤波器应用于每个图像的每个通道并获得10 * 32 * 64 * 14 * 14数据?
Yes and No. You do apply the filters to each channel and each image, but you don't get 10*32*64*14*14
output dimensions. 是和否。您可以将滤镜应用于每个通道和每个图像,但不会获得
10*32*64*14*14
输出尺寸。 The dimensionality of the output is going to be 10*64*14*14
, because the layer specified 64 output channels per image. 输出的维数将为
10*64*14*14
,因为该层为每个图像指定了64个输出通道。 In turn, the weights used for this convolution will have size 32*64*5*5
(64 5-by-5 filters for every channel on the input). 反过来,用于该卷积的权重将具有
32*64*5*5
大小(对于输入上的每个通道,64个5乘5的滤波器)。
No. If you convolve & pad (ignoring the batch size) a 14x14x32
volume with a set of 64 5x5
filters, you'll end up with a 14x14x64
output volume 不会。如果您使用一组64个
5x5
过滤器进行卷积和填充(忽略批量大小) 14x14x32
音量,您最终将获得14x14x64
输出音量
Every single convolutional filter is convolved along the whole input depth. 每个卷积滤波器都沿整个输入深度进行卷积。 Thus, your
14x14x32
input volume is convolved with a 5x5
filter and then the output is a 14x14x1
feature map. 因此,您的
14x14x32
输入音量与5x5
滤波器进行卷积,然后输出为14x14x1
特征映射。
Then, the second 5x5
filter of the stack of 64 filters, is convolved again with the input volume. 然后,64个滤波器堆栈的第二个
5x5
滤波器再次与输入音量卷积。 The same operation is done for each one of the 64 filters and the resulting feature maps are stacked, forming your output volume 14x14x64
对64个滤波器中的每一个进行相同的操作,并将所得到的特征图堆叠起来,形成输出音量
14x14x64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.