了解 Conv2d 的输入和 output 大小

Question

I'm learning image classification using PyTorch (using CIFAR-10 dataset) following this link .我正在通过此链接学习使用 PyTorch（使用 CIFAR-10 数据集）进行图像分类。

I'm trying to understand the input & output parameters for the given Conv2d code:我正在尝试了解给定Conv2d代码的输入和 output 参数：

import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Net()

My conv2d() understanding (Please correct if I am wrong/missing anything):我conv2d()的理解（如果我错了/遗漏了什么，请更正）：

since image has 3 channels that's why first parameter is 3 .因为 image 有 3 个通道，所以第一个参数是3 。 6 is no of filters (randomly chosen) 6是过滤器的数量（随机选择）
5 is kernel size (5, 5) (randomly chosen) 5为kernel尺寸（5、5）（随机选择）
likewise we create next layer (previous layer output is input of this layer)同样我们创建下一层（上一层 output 是这一层的输入）
Now creating a fully connected layer using linear function: self.fc1 = nn.Linear(16 * 5 * 5, 120)现在使用linear function 创建一个全连接层： self.fc1 = nn.Linear(16 * 5 * 5, 120)

16 * 5 * 5 : here 16 is the output of last conv2d layer, But what is 5 * 5 in this?. 16 * 5 * 5 ：这里的16是最后一个 conv2d 层的 output，但是这里面的5 * 5是什么？

Is this kernel size?这是 kernel 尺寸吗？ or something else?或者是其他东西？ How to know we need to multiply by 5*5 or 4*4 or 3*3.....如何知道我们需要乘以5*5 or 4*4 or 3*3.....

I researched & got to know that since image size is 32*32 , applying max pool(2) 2 times, so image size would be 32 -> 16 -> 8, so we should multiply it by last_ouput_size * 8 * 8 But in this link its 5*5 .我研究并知道，由于图像大小是32*32 ，应用 max pool(2) 2次，所以图像大小将是 32 -> 16 -> 8，所以我们应该将它乘以last_ouput_size * 8 * 8但是在这个链接它的5*5 。

Could anyone please explain?谁能解释一下？

Answer 1

These are the dimensions of the image size itself (ie Height x Width).这些是图像大小本身的尺寸（即高度 x 宽度）。

Unpadded convolutions未填充的卷积

Unless you pad your image with zeros, a convolutional filter will shrink the size of your output image by filter_size - 1 across the height and width:除非您用零填充图像，否则卷积过滤器将在高度和宽度上将 output 图像的大小缩小 filter_size filter_size - 1 ：


3-filter takes a 5x5 image to a (5-(3-1) x 5-(3-1)) image 3-filter 将 5x5 图像转换为 (5-(3-1) x 5-(3-1)) 图像	Zero padding preserves image dimensions零填充保留图像尺寸

You can add padding in Pytorch by setting Conv2d(padding=...) .您可以通过设置Conv2d(padding=...)在 Pytorch 中添加填充。

Chain of transformations转换链

Since it has gone through:既然经历了：

Layer层	Shape Transformation形状变换
one conv layer (without padding)一个卷积层（无填充）	`(h, w) -> (h-4, w-4)`
a MaxPool一个最大池	`-> ((h-4)//2, (w-4)//2)`
another conv layer (without padding)另一个卷积层（没有填充）	`-> ((h-8)//2, (w-8)//2)`
another MaxPool另一个 MaxPool	`-> ((h-8)//4, (w-8)//4)`
a Flatten一个展平	`-> ((h-8)//4 * (w-8)//4)`

We go from the original image size of (32,32) to (28,28) to (14,14) to (10,10) to (5,5) to (5x5) .我们将 go 的原始图像尺寸从(32,32)到(28,28)到(14,14)到(10,10)到(5,5)到(5x5) 。

To visualise this you can use the torchsummary package:为了可视化这一点，您可以使用torchsummary package：

from torchsummary import summary

input_shape = (3,32,32)
summary(Net(), input_shape)

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1            [-1, 6, 28, 28]             456
         MaxPool2d-2            [-1, 6, 14, 14]               0
            Conv2d-3           [-1, 16, 10, 10]           2,416
         MaxPool2d-4             [-1, 16, 5, 5]               0
            Linear-5                  [-1, 120]          48,120
            Linear-6                   [-1, 84]          10,164
            Linear-7                   [-1, 10]             850
================================================================

了解 Conv2d 的输入和 output 大小

问题描述

1 个解决方案

解决方案1
2 已采纳 2021-03-29 07:05:12

Unpadded convolutions未填充的卷积

Chain of transformations转换链

了解 Conv2d 的输入和 output 大小

问题描述

1 个解决方案

解决方案1 2 已采纳 2021-03-29 07:05:12

Unpadded convolutions未填充的卷积

Chain of transformations转换链

解决方案1
2 已采纳 2021-03-29 07:05:12