简体繁体 English

卷积神经网络，第二个转换层如何在第一个汇集层上工作

[英]Convolutional neural network, how the second conv layer works on the first pooling layer

原文 2017-10-09 06:01:49 0 2 tensorflow

I'm reading material from the TensorFlow website: 我正在阅读TensorFlow网站上的资料：

https://www.tensorflow.org/tutorials/layers https://www.tensorflow.org/tutorials/layers

Suppose we have 10 greyscale monochrome 28x28 pixel images, 假设我们有10个灰度单色28x28像素图像，

If we apply 32 5x5 convolutional filters with 0 padding in the 1st conv layer, we get 10*32*28*28 data. 如果我们在第一个转换层中应用32个5x5卷积滤波器和0填充，我们得到10 * 32 * 28 * 28数据。
If We apply 2x2 max pooling with stride 2 in the 1st pooling, we get 10*32*14*14 data. 如果我们在第一个池中应用2x2 max pooling with stride 2，我们得到10 * 32 * 14 * 14数据。
By now, one image has become a 14*14 size image with 32 channels. 到目前为止，一个图像已成为具有32个通道的14 * 14尺寸图像。

So, if we apply a second convolutional layer(let's say 64 5x5 filters as in the link), do we apply these filters to each channel of each image and get 10*32*64*14*14 data? 因此，如果我们应用第二个卷积层（比如链接中的64个5x5滤波器），我们是否将这些滤波器应用于每个图像的每个通道并获得10 * 32 * 64 * 14 * 14数据？

2 个解决方案

Yes and No. You do apply the filters to each channel and each image, but you don't get 10*32*64*14*14 output dimensions. 是和否。您可以将滤镜应用于每个通道和每个图像，但不会获得10*32*64*14*14输出尺寸。 The dimensionality of the output is going to be 10*64*14*14 , because the layer specified 64 output channels per image. 输出的维数将为10*64*14*14 ，因为该层为每个图像指定了64个输出通道。 In turn, the weights used for this convolution will have size 32*64*5*5 (64 5-by-5 filters for every channel on the input). 反过来，用于该卷积的权重将具有32*64*5*5大小（对于输入上的每个通道，64个5乘5的滤波器）。

No. If you convolve & pad (ignoring the batch size) a 14x14x32 volume with a set of 64 5x5 filters, you'll end up with a 14x14x64 output volume 不会。如果您使用一组64个5x5过滤器进行卷积和填充（忽略批量大小） 14x14x32音量，您最终将获得14x14x64输出音量

Every single convolutional filter is convolved along the whole input depth. 每个卷积滤波器都沿整个输入深度进行卷积。 Thus, your 14x14x32 input volume is convolved with a 5x5 filter and then the output is a 14x14x1 feature map. 因此，您的14x14x32输入音量与5x5滤波器进行卷积，然后输出为14x14x1特征映射。

Then, the second 5x5 filter of the stack of 64 filters, is convolved again with the input volume. 然后，64个滤波器堆栈的第二个5x5滤波器再次与输入音量卷积。 The same operation is done for each one of the 64 filters and the resulting feature maps are stacked, forming your output volume 14x14x64 对64个滤波器中的每一个进行相同的操作，并将所得到的特征图堆叠起来，形成输出音量14x14x64