简体繁体 English

隐藏 3d 卷积网络层的形状，如何计算？

[英]Shapes of hidden 3d convolutional network layers, how to compute it?

原文 2020-06-02 17:17:07 5 1 machine-learning/ multidimensional-array/ deep-learning/ computer-vision/ conv-neural-network

I want to computer the shape of a given hidden layer output of a CNN, so suppose the input shape is (27,27,27,1) ie one image channe, the first convolutional layer is a 16X(3,3,3) kernels, stride is 1 and padding is 0, so the output shape of this layer is: (25,25,25,16), the 16 corresponds to the number of kernels in this layer, so here we have 16 volumes of shape (25,25,25) after that, we have a second convolutional layer of 32X(3,3,3) kernels, stride is 1 and the padding is 0, so every kernels in this layers should (normally) be applied to the whole input volume of this layers ie to the (25,25,25,16), but here the output is of shape (23,23,23,32), what i understand if I base myself on conv2d, is that every kernel is applied on the whole volume in the input, and we stack the result of all kernels to have a deep volume in the output of the layer in我想计算CNN的给定隐藏层output的形状，所以假设输入形状是（27,27,27,1），即一个图像通道，第一个卷积层是16X（3,3,3） kernels，stride为1，padding为0，所以这一层的output shape为：(25,25,25,16)，16对应的是这一层的kernel个数，所以这里我们有16个volumes的shape( 25,25,25) 之后，我们有一个 32X(3,3,3) 内核的第二个卷积层，步幅为 1，填充为 0，因此该层中的每个内核应该（通常）应用于整个该层的输入量即（25,25,25,16），但这里 output 的形状为（23,23,23,32），如果我基于 conv2d，我理解的是，每个 kernel 是应用于输入中的整个体积，我们将所有内核的结果堆叠在层的 output 中 question. 问题。 so I did not understand how we got this (23,23,23,32), and where is gone the fourth dimension of the output of the first layer (ie 16)??所以我不明白我们是怎么得到这个的（23,23,23,32），第一层的output的第四维（即16）去哪儿了？？

1 个解决方案

Considering only the second layer (ignoring bias)仅考虑第二层（忽略偏差）

Input shape: (25, 25, 25, 16)输入形状：（25、25、25、16）
Kernel size: (3, 3, 3) Kernel 尺寸：（3、3、3）
Output channels: 32 Output 通道：32
Total number of weights: 16 * 32 * 3 * 3 * 3权重总数：16 * 32 * 3 * 3 * 3

Similar to the way that 2D conv layers are really performing multiple 3D convolutions, 3D conv layers are really performing multiple 4D convolution.与 2D 卷积层实际执行多个 3D 卷积的方式类似，3D 卷积层实际上执行多个 4D 卷积。 Specifically, the 3D conv layer in question is performing 32 convolutions, each with a kernel of size (3, 3, 3, 16).具体来说，所讨论的 3D 卷积层正在执行 32 个卷积，每个卷积都有一个大小为 (3, 3, 3, 16) 的 kernel。 The result of one of these convolution operations on the input feature is (23, 23, 23, 1).这些卷积操作之一对输入特征的结果是 (23, 23, 23, 1)。 After all 32 of these operations are completed they are stacked along the channel dimension resulting in a final output shape of (23, 23, 23, 32).在所有 32 个操作完成后，它们沿通道维度堆叠，最终形成 (23, 23, 23, 32) 的 output 形状。