[英]Understanding input and output size for Conv2d
I'm learning image classification using PyTorch (using CIFAR-10 dataset) following this link .我正在通过此链接学习使用 PyTorch(使用 CIFAR-10 数据集)进行图像分类。
I'm trying to understand the input & output parameters for the given Conv2d
code:我正在尝试了解给定
Conv2d
代码的输入和 output 参数:
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net()
My conv2d()
understanding (Please correct if I am wrong/missing anything):我
conv2d()
的理解(如果我错了/遗漏了什么,请更正):
3
.3
。 6
is no of filters (randomly chosen) 6
是过滤器的数量(随机选择)5
is kernel size (5, 5) (randomly chosen) 5
为kernel尺寸(5、5)(随机选择)linear
function: self.fc1 = nn.Linear(16 * 5 * 5, 120)linear
function 创建一个全连接层: self.fc1 = nn.Linear(16 * 5 * 5, 120) 16 * 5 * 5
: here 16
is the output of last conv2d layer, But what is 5 * 5
in this?. 16 * 5 * 5
:这里的16
是最后一个 conv2d 层的 output,但是这里面的5 * 5
是什么?
Is this kernel size?这是 kernel 尺寸吗? or something else?
或者是其他东西? How to know we need to multiply by
5*5 or 4*4 or 3*3.....
如何知道我们需要乘以
5*5 or 4*4 or 3*3.....
I researched & got to know that since image size is 32*32
, applying max pool(2) 2 times, so image size would be 32 -> 16 -> 8, so we should multiply it by last_ouput_size * 8 * 8
But in this link its 5*5
.我研究并知道,由于图像大小是
32*32
,应用 max pool(2) 2次,所以图像大小将是 32 -> 16 -> 8,所以我们应该将它乘以last_ouput_size * 8 * 8
但是在这个链接它的5*5
。
Could anyone please explain?谁能解释一下?
These are the dimensions of the image size itself (ie Height x Width).这些是图像大小本身的尺寸(即高度 x 宽度)。
Unless you pad your image with zeros, a convolutional filter will shrink the size of your output image by filter_size - 1
across the height and width:除非您用零填充图像,否则卷积过滤器将在高度和宽度上将 output 图像的大小缩小 filter_size
filter_size - 1
:
|
|
---|---|
3-filter takes a 5x5 image to a (5-(3-1) x 5-(3-1)) image ![]() |
Zero padding preserves image dimensions![]() |
You can add padding in Pytorch by setting Conv2d(padding=...)
.您可以通过设置
Conv2d(padding=...)
在 Pytorch 中添加填充。
Since it has gone through:既然经历了:
Layer![]() |
Shape Transformation![]() |
---|---|
one conv layer (without padding)![]() |
(h, w) -> (h-4, w-4) |
a MaxPool![]() |
-> ((h-4)//2, (w-4)//2) |
another conv layer (without padding)![]() |
-> ((h-8)//2, (w-8)//2) |
another MaxPool![]() |
-> ((h-8)//4, (w-8)//4) |
a Flatten![]() |
-> ((h-8)//4 * (w-8)//4) |
We go from the original image size of (32,32)
to (28,28)
to (14,14)
to (10,10)
to (5,5)
to (5x5)
.我们将 go 的原始图像尺寸从
(32,32)
到(28,28)
到(14,14)
到(10,10)
到(5,5)
到(5x5)
。
To visualise this you can use the torchsummary
package:为了可视化这一点,您可以使用
torchsummary
package:
from torchsummary import summary
input_shape = (3,32,32)
summary(Net(), input_shape)
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 6, 28, 28] 456
MaxPool2d-2 [-1, 6, 14, 14] 0
Conv2d-3 [-1, 16, 10, 10] 2,416
MaxPool2d-4 [-1, 16, 5, 5] 0
Linear-5 [-1, 120] 48,120
Linear-6 [-1, 84] 10,164
Linear-7 [-1, 10] 850
================================================================
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.