Understanding input and output size for Conv2d

Question

I'm learning image classification using PyTorch (using CIFAR-10 dataset) following this link .

I'm trying to understand the input & output parameters for the given Conv2d code:

import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Net()

My conv2d() understanding (Please correct if I am wrong/missing anything):

since image has 3 channels that's why first parameter is 3 . 6 is no of filters (randomly chosen)
5 is kernel size (5, 5) (randomly chosen)
likewise we create next layer (previous layer output is input of this layer)
Now creating a fully connected layer using linear function: self.fc1 = nn.Linear(16 * 5 * 5, 120)

16 * 5 * 5 : here 16 is the output of last conv2d layer, But what is 5 * 5 in this?.

Is this kernel size? or something else? How to know we need to multiply by 5*5 or 4*4 or 3*3.....

I researched & got to know that since image size is 32*32 , applying max pool(2) 2 times, so image size would be 32 -> 16 -> 8, so we should multiply it by last_ouput_size * 8 * 8 But in this link its 5*5 .

Could anyone please explain?

Answer 1

These are the dimensions of the image size itself (ie Height x Width).

Unpadded convolutions

Unless you pad your image with zeros, a convolutional filter will shrink the size of your output image by filter_size - 1 across the height and width:


3-filter takes a 5x5 image to a (5-(3-1) x 5-(3-1)) image	Zero padding preserves image dimensions

You can add padding in Pytorch by setting Conv2d(padding=...) .

Chain of transformations

Since it has gone through:

Layer	Shape Transformation
one conv layer (without padding)	`(h, w) -> (h-4, w-4)`
a MaxPool	`-> ((h-4)//2, (w-4)//2)`
another conv layer (without padding)	`-> ((h-8)//2, (w-8)//2)`
another MaxPool	`-> ((h-8)//4, (w-8)//4)`
a Flatten	`-> ((h-8)//4 * (w-8)//4)`

We go from the original image size of (32,32) to (28,28) to (14,14) to (10,10) to (5,5) to (5x5) .

To visualise this you can use the torchsummary package:

from torchsummary import summary

input_shape = (3,32,32)
summary(Net(), input_shape)

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1            [-1, 6, 28, 28]             456
         MaxPool2d-2            [-1, 6, 14, 14]               0
            Conv2d-3           [-1, 16, 10, 10]           2,416
         MaxPool2d-4             [-1, 16, 5, 5]               0
            Linear-5                  [-1, 120]          48,120
            Linear-6                   [-1, 84]          10,164
            Linear-7                   [-1, 10]             850
================================================================

Understanding input and output size for Conv2d

Question

1 answers

solution1
2 ACCPTED 2021-03-29 07:05:12

Unpadded convolutions

Chain of transformations

Understanding input and output size for Conv2d

Question

1 answers

solution1 2 ACCPTED 2021-03-29 07:05:12

Unpadded convolutions

Chain of transformations

solution1
2 ACCPTED 2021-03-29 07:05:12