如何修改此PyTorch卷积神经网络以接受64 x 64图像并正确输出预测？

Question

I took this convolutional neural network (CNN) from here . 我从这里开始使用这个卷积神经网络（CNN）。 It accepts 32 x 32 images and defaults to 10 classes. 它接受32 x 32图像，默认为10类。 However, I have 64 x 64 images with 500 classes. 但是，我有500个类的64 x 64图像。 When I pass in 64 x 64 images (batch size held constant at 32), I get the following error. 当我传递64 x 64图像（批量大小保持恒定为32）时，出现以下错误。

ValueError: Expected input batch_size (128) to match target batch_size (32).

The stack trace starts at the line loss = loss_fn(outputs, labels) . 堆栈跟踪从line loss = loss_fn(outputs, labels) 。 The outputs.shape is [128, 500] and the labels.shape is [32] . 所述outputs.shape是[128, 500]和labels.shape是[32] 。

The code is listed here for completeness. 为了完整起见，此处列出了代码。

class Unit(nn.Module):
    def __init__(self,in_channels,out_channels):
        super(Unit,self).__init__()
        self.conv = nn.Conv2d(in_channels=in_channels,kernel_size=3,out_channels=out_channels,stride=1,padding=1)
        self.bn = nn.BatchNorm2d(num_features=out_channels)
        self.relu = nn.ReLU()

    def forward(self,input):
        output = self.conv(input)
        output = self.bn(output)
        output = self.relu(output)
        return output

class SimpleNet(nn.Module):
    def __init__(self,num_classes=10):
        super(SimpleNet,self).__init__()

        self.unit1 = Unit(in_channels=3,out_channels=32)
        self.unit2 = Unit(in_channels=32, out_channels=32)
        self.unit3 = Unit(in_channels=32, out_channels=32)

        self.pool1 = nn.MaxPool2d(kernel_size=2)

        self.unit4 = Unit(in_channels=32, out_channels=64)
        self.unit5 = Unit(in_channels=64, out_channels=64)
        self.unit6 = Unit(in_channels=64, out_channels=64)
        self.unit7 = Unit(in_channels=64, out_channels=64)

        self.pool2 = nn.MaxPool2d(kernel_size=2)

        self.unit8 = Unit(in_channels=64, out_channels=128)
        self.unit9 = Unit(in_channels=128, out_channels=128)
        self.unit10 = Unit(in_channels=128, out_channels=128)
        self.unit11 = Unit(in_channels=128, out_channels=128)

        self.pool3 = nn.MaxPool2d(kernel_size=2)

        self.unit12 = Unit(in_channels=128, out_channels=128)
        self.unit13 = Unit(in_channels=128, out_channels=128)
        self.unit14 = Unit(in_channels=128, out_channels=128)

        self.avgpool = nn.AvgPool2d(kernel_size=4)

        self.net = nn.Sequential(self.unit1, self.unit2, self.unit3, self.pool1, self.unit4, self.unit5, self.unit6
                                 ,self.unit7, self.pool2, self.unit8, self.unit9, self.unit10, self.unit11, self.pool3,
                                 self.unit12, self.unit13, self.unit14, self.avgpool)

        self.fc = nn.Linear(in_features=128,out_features=num_classes)

    def forward(self, input):
        output = self.net(input)
        output = output.view(-1,128)
        output = self.fc(output)
        return output

Any ideas on how to modify this CNN to accept and properly return outputs? 关于如何修改此CNN以接受并正确返回输出的任何想法？

Answer 1

The problem is an incompatible reshape (view) at the end. 问题是最后的重塑（视图）不兼容。

You're using a sort of "flattening" at the end, which is different from a "global pooling". 最后，您使用的是一种“扁平化”，这与“全局池化”不同。 Both are valid for CNNs, but only the global poolings are compatible with any image size. 两者都对CNN有效，但只有全局池与任何图像大小兼容。

The flattened net (your case) 扁平网（您的情况）

In your case, with a flatten, you need to keep track of all image dimensions in order to know how to reshape at the end. 对于您的情况，使用展平，您需要跟踪所有图像尺寸，以便知道如何在最后进行整形。

So: 所以：

Enter with 64x64 以64x64输入
Pool1 to 32x32 池1至32x32
Pool2 to 16x16 池2至16x16
Pool3 to 8x8 3至8x8池
AvgPool to 2x2 AvgPool至2x2

Then, at the end you've got a shape of (batch, 128, 2, 2) . 然后，最后得到的形状为(batch, 128, 2, 2) 。 Four times the final number if the image were 32x32. 如果图像为32x32，则为最终数字的四倍。

Then, your final reshape should be output = output.view(-1,128*2*2) . 然后，最后的重塑应为output = output.view(-1,128*2*2) 。

This is a different net with a different classification layer, though, because in_features=512 . 但是，这是一个具有不同分类层的不同网络，因为in_features=512 。

The global pooling net 全球池网

On the other hand, you could use the same model, same layers and same weights for any image size >= 32 if you replace the last pooling with a global pooling: 另一方面，如果将最后一个池替换为全局池，则对于大于等于32的任何图像大小，可以使用相同的模型，相同的层和相同的权重：

def flatChannels(x):
    size = x.size()
    return x.view(size[0],size[1],size[2]*size[3])

def globalAvgPool2D(x):        
    return flatChannels(x).mean(dim=-1)

def globalMaxPool2D(x):
    return flatChannels(x).max(dim=-1)

The ending of the model: 模型的结尾：

    #removed the pool from here to put it in forward
    self.net = nn.Sequential(self.unit1, self.unit2, self.unit3, self.pool1, self.unit4, 
                             self.unit5, self.unit6, self.unit7, self.pool2, self.unit8, 
                             self.unit9, self.unit10, self.unit11, self.pool3, 
                             self.unit12, self.unit13, self.unit14)

    self.fc = nn.Linear(in_features=128,out_features=num_classes)


def forward(self, input):
    output = self.net(input)
    output = globalAvgPool2D(output) #or globalMaxPool2D
    output = self.fc(output)
    return output

Answer 2

You need to use transforms module before trainig neural network (here is the link https://pytorch.org/docs/stable/torchvision/transforms.html ). 您需要在训练神经网络之前使用transforms模块（这里是链接https://pytorch.org/docs/stable/torchvision/transforms.html ）。

You have a few options: 您有几种选择：

transforms.Resize(32), transforms.Resize（32），
transforms.ResizedCrop(32) - most preferable, because you can augment your data and prevent overfitting in some respect via this way. transforms.ResizedCrop（32）-最可取的是，因为您可以通过这种方式扩充数据并在某些方面防止过度拟合。
transforms.CenterCrop(32), etc. transforms.CenterCrop（32）等

Moreover, you could compose transforms objects into one object via transforms.Compose). 此外，您可以通过transforms.Compose将对象转换为一个对象。

Enjoy. 请享用。

PS. PS。 Of course, you can refactor your Neural Network architecture, enabling it to take images of size 64 x 64. 当然，您可以重构神经网络体系结构，使其能够拍摄尺寸为64 x 64的图像。

如何修改此PyTorch卷积神经网络以接受64 x 64图像并正确输出预测？

问题描述

2 个解决方案

解决方案1
1 已采纳 2018-12-21 11:37:37

The flattened net (your case) 扁平网（您的情况）

The global pooling net 全球池网

解决方案2
-1 2018-12-20 21:55:52

如何修改此PyTorch卷积神经网络以接受64 x 64图像并正确输出预测？

问题描述

2 个解决方案

解决方案1 1 已采纳 2018-12-21 11:37:37

The flattened net (your case) 扁平网（您的情况）

The global pooling net 全球池网

解决方案2 -1 2018-12-20 21:55:52

解决方案1
1 已采纳 2018-12-21 11:37:37

解决方案2
-1 2018-12-20 21:55:52