简体   繁体   English

在 CNN 网络中,第二个卷积层 output 背后的数学原理是什么?

[英]In CNN network, what is the mathematics behind second convolutional layer output?

I am a little confused about the mathematics of solving the output of a second convolutional layer.我对求解第二个卷积层的 output 的数学有点困惑。 I have an output of the first convolutional layer of shape (11,11,64), and now I have a second convolutional layer where kernel specifications are 64 filters with 3x3 size, the stride is 1, and padding is 'same'.我有一个形状为 (11,11,64) 的第一个卷积层的 output,现在我有第二个卷积层,其中 kernel 规格是 64 个 3x3 大小的过滤器,步幅为 1,填充“相同”。 When I check the model summary and all, it shows the kernel of the second convolutional layer has a shape (3,3,64,64) but the output shape of the second convolutional layer is (11,11,64).当我检查 model 摘要和所有内容时,它显示第二个卷积层的 kernel 具有形状(3、3、64、64),但 Z78E6221F6393D13566811、DB398F14CE、第二个卷积层的形状是(11DZ)。 So I am confused here about how to get (11,11,64).所以我在这里对如何获得(11,11,64)感到困惑。 I checked the internet, and they say that the convolution will result in a 11x11x1 shape because of stacking, and for 64 images, it will be 11,11,64.我查了互联网,他们说卷积会因为堆叠而产生 11x11x1 的形状,对于 64 张图像,它将是 11、11、64。 So what is the mathematics behind getting the shape 11x11x1?那么得到 11x11x1 形状背后的数学原理是什么? I could only understand the shape should result in 11,11,64,64.我只能理解形状应该是 11,11,64,64。 Please help me to understand since I need to code this algorithm for hardware.请帮助我理解,因为我需要为硬件编写这个算法。

You start with 64 images (more precisely, 1 "image" with 64 channels, but we'll stick with 64 images for simplicity), each has a size of 11 * 11:你从 64 个图像开始(更准确地说,1 个具有 64 个通道的“图像”,但为简单起见,我们将坚持使用 64 个图像),每个图像的大小为 11 * 11:

I1 :: 11 * 11
I2 :: 11 * 11
...
I64 :: 11 * 11

Then we have the convolution kernel.然后我们有卷积 kernel。 Assume the kernel shape is 1 * 64 * 11 * 11, then for each input image (again, it should be "channel" technically), there is a corresponding kernel:假设 kernel 的形状是 1 * 64 * 11 * 11,那么对于每个输入图像(同样,从技术上讲,它应该是“通道”),都有一个对应的 kernel:

K1 :: 3 * 3
...
K64 :: 3 * 3

Then we calculate convolution between I1 and K1, I2 and K2, ..., I64 and K64.然后我们计算 I1 和 K1、I2 和 K2、...、I64 和 K64 之间的卷积。 Now it looks like we have sixty-four 11 * 11 results, but actually, we ADD them together into a single one: O1 = K1 * I1 +... + K64 * I64 where * means convolution.现在看起来我们有 64 个 11 * 11 结果,但实际上,我们将它们相加成一个:O1 = K1 * I1 +... + K64 * I64 其中 * 表示卷积。 That is where the 1 * 11 * 11 comes from.这就是 1 * 11 * 11 的来源。

Finally, since the actual kernel shape is 64 * 64 * 11 * 11, the output has the shape 64 * 11 * 11:最后,由于实际的 kernel 形状为 64 * 64 * 11 * 11,因此 output 的形状为 64 * 11 * 11:

O1 = K1_1 * I1 + ... + K64_1 * I64
O2 = K1_2 * I1 + ... + K64_2 * I64
...
O64 = K1_64 * I1 + ... + K64_64 * I64

I hope it makes things somewhat clearer.我希望它能让事情变得更清楚一些。 Coincidentally, I am doing some coding on hardware as well, and I was learning those last month.巧合的是,我也在做一些硬件编码,上个月我正在学习这些。

This might help you这可能会帮助你


input_layer2.shape == (11, 11, 64)
kernel_layer2.shape == (3, 3, 64, 64)

input_layer2[:3, :3].shape == (3, 3, 64)
kernel_layer2[:,:,:,0].shape == (3, 3, 64)

its only for output_layer2[0, 0]:它仅适用于 output_layer2[0, 0]:

for i in range(64):
    output_layer2[0, 0, i] = np.sum(np.dot(input_layer2[:3, :3], kernel_layer2[:,:,:,i]))

finaly for all stride:最后一步:

output_layer2.shape == (11, 11, 64)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM