在 CNN 网络中，第二个卷积层 output 背后的数学原理是什么？

Question

I am a little confused about the mathematics of solving the output of a second convolutional layer.我对求解第二个卷积层的 output 的数学有点困惑。 I have an output of the first convolutional layer of shape (11,11,64), and now I have a second convolutional layer where kernel specifications are 64 filters with 3x3 size, the stride is 1, and padding is 'same'.我有一个形状为 (11,11,64) 的第一个卷积层的 output，现在我有第二个卷积层，其中 kernel 规格是 64 个 3x3 大小的过滤器，步幅为 1，填充“相同”。 When I check the model summary and all, it shows the kernel of the second convolutional layer has a shape (3,3,64,64) but the output shape of the second convolutional layer is (11,11,64).当我检查 model 摘要和所有内容时，它显示第二个卷积层的 kernel 具有形状（3、3、64、64），但 Z78E6221F6393D13566811、DB398F14CE、第二个卷积层的形状是（11DZ）。 So I am confused here about how to get (11,11,64).所以我在这里对如何获得（11,11,64）感到困惑。 I checked the internet, and they say that the convolution will result in a 11x11x1 shape because of stacking, and for 64 images, it will be 11,11,64.我查了互联网，他们说卷积会因为堆叠而产生 11x11x1 的形状，对于 64 张图像，它将是 11、11、64。 So what is the mathematics behind getting the shape 11x11x1?那么得到 11x11x1 形状背后的数学原理是什么？ I could only understand the shape should result in 11,11,64,64.我只能理解形状应该是 11,11,64,64。 Please help me to understand since I need to code this algorithm for hardware.请帮助我理解，因为我需要为硬件编写这个算法。

Answer 1

You start with 64 images (more precisely, 1 "image" with 64 channels, but we'll stick with 64 images for simplicity), each has a size of 11 * 11:你从 64 个图像开始（更准确地说，1 个具有 64 个通道的“图像”，但为简单起见，我们将坚持使用 64 个图像），每个图像的大小为 11 * 11：

I1 :: 11 * 11
I2 :: 11 * 11
...
I64 :: 11 * 11

Then we have the convolution kernel.然后我们有卷积 kernel。 Assume the kernel shape is 1 * 64 * 11 * 11, then for each input image (again, it should be "channel" technically), there is a corresponding kernel:假设 kernel 的形状是 1 * 64 * 11 * 11，那么对于每个输入图像（同样，从技术上讲，它应该是“通道”），都有一个对应的 kernel：

K1 :: 3 * 3
...
K64 :: 3 * 3

Then we calculate convolution between I1 and K1, I2 and K2, ..., I64 and K64.然后我们计算 I1 和 K1、I2 和 K2、...、I64 和 K64 之间的卷积。 Now it looks like we have sixty-four 11 * 11 results, but actually, we ADD them together into a single one: O1 = K1 * I1 +... + K64 * I64 where * means convolution.现在看起来我们有 64 个 11 * 11 结果，但实际上，我们将它们相加成一个：O1 = K1 * I1 +... + K64 * I64 其中 * 表示卷积。 That is where the 1 * 11 * 11 comes from.这就是 1 * 11 * 11 的来源。

Finally, since the actual kernel shape is 64 * 64 * 11 * 11, the output has the shape 64 * 11 * 11:最后，由于实际的 kernel 形状为 64 * 64 * 11 * 11，因此 output 的形状为 64 * 11 * 11：

O1 = K1_1 * I1 + ... + K64_1 * I64
O2 = K1_2 * I1 + ... + K64_2 * I64
...
O64 = K1_64 * I1 + ... + K64_64 * I64

I hope it makes things somewhat clearer.我希望它能让事情变得更清楚一些。 Coincidentally, I am doing some coding on hardware as well, and I was learning those last month.巧合的是，我也在做一些硬件编码，上个月我正在学习这些。

Answer 2

This might help you这可能会帮助你


input_layer2.shape == (11, 11, 64)
kernel_layer2.shape == (3, 3, 64, 64)

input_layer2[:3, :3].shape == (3, 3, 64)
kernel_layer2[:,:,:,0].shape == (3, 3, 64)

its only for output_layer2[0, 0]:它仅适用于 output_layer2[0, 0]：

for i in range(64):
    output_layer2[0, 0, i] = np.sum(np.dot(input_layer2[:3, :3], kernel_layer2[:,:,:,i]))

finaly for all stride:最后一步：

output_layer2.shape == (11, 11, 64)

在 CNN 网络中，第二个卷积层 output 背后的数学原理是什么？

问题描述

2 个解决方案

解决方案1
0 2022-08-08 18:01:28

解决方案2
0 2022-08-08 18:17:05

在 CNN 网络中，第二个卷积层 output 背后的数学原理是什么？

问题描述

2 个解决方案

解决方案1 0 2022-08-08 18:01:28

解决方案2 0 2022-08-08 18:17:05

解决方案1
0 2022-08-08 18:01:28

解决方案2
0 2022-08-08 18:17:05