简体   繁体   中英

Calculating dimensions of fully connected layer?

I am struggling to work out how to calculate the dimensions for the fully connected layer. I am inputing images which are (448x448) using a batch size (16). Below is the code for my convolutional layers:

class ConvolutionalNet(nn.Module):
  def __init__(self, num_classes=182):
    super().__init__()

    self.layer1 = nn.Sequential(
        nn.Conv2d(3, 16, kernal_size=5, stride=1, padding=2),
        nn.BatchNorm2d(16),
        nn.ReLU(),
        nn.MaxPool2d(kernal_size=2, stride=2)
    )

    self.layer2 = nn.Sequential(
        nn.Conv2d(16, 32, kernal_size=5, stride=1, padding=2),
        nn.BatchNorm2d(32),
        nn.ReLU(),
        nn.MaxPool2d(kernal_size=2, stride=2)
    )

    self.layer3 = nn.Sequential(
        nn.Conv2d(32, 32, kernal_size=5, stride=1, padding=2),
        nn.BatchNorm2d(32),
        nn.ReLU(),
        nn.MaxPool2d(kernal_size=2, stride=2)
    )

    self.layer4 = nn.Sequential(
        nn.Conv2d(32, 64, kernal_size=5, stride=1, padding=2),
        nn.BatchNorm2d(64),
        nn.ReLU(),
        nn.MaxPool2d(kernal_size=2, stride=2)
    )

    self.layer5 = nn.Sequential(
        nn.Conv2d(64, 64, kernal_size=5, stride=1, padding=2),
        nn.BatchNorm2d(64),
        nn.ReLU(),
        nn.MaxPool2d(kernal_size=2, stride=2)
    )

I want to add a fully connected layer:

self.fc = nn.Linear(?, num_classes)

Would anyone be able to explain the best way to go about calculating this? Also, if I have multiple fully connected layers eg (self.fc2, self.fc3), would the second parameter always equal the number of classes. I am new to coding and finding it hard to wrap my head around this.

The conv layers don't change the width/height of the features since you've set padding equal to (kernel_size - 1) / 2 . Max pooling with kernel_size = stride = 2 will decrease the width/height by a factor of 2 (rounded down if input shape is not even).

Using 448 as input width/height, the output width/height will be 448 // 2 // 2 // 2 // 2 // 2 = 448/32 = 14 (where // is floor-divide operator).

The number of channels is fully determined by the last conv layer, which outputs 64 channels.

Therefore you will have a [B,64,14,14] shaped tensor, so the Linear layer should have in_features = 64*14*14 = 12544 .

Note you'll need to flatten the input beforehand, something like.

self.layer6 = nn.Sequential(
    nn.Flatten(),
    nn.Linear(12544, num_classes)
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM