简体   繁体   English

如何修改此PyTorch卷积神经网络以接受64 x 64图像并正确输出预测?

[英]How do I modify this PyTorch convolutional neural network to accept a 64 x 64 image and properly output predictions?

I took this convolutional neural network (CNN) from here . 我从这里开始使用这个卷积神经网络(CNN)。 It accepts 32 x 32 images and defaults to 10 classes. 它接受32 x 32图像,默认为10类。 However, I have 64 x 64 images with 500 classes. 但是,我有500个类的64 x 64图像。 When I pass in 64 x 64 images (batch size held constant at 32), I get the following error. 当我传递64 x 64图像(批量大小保持恒定为32)时,出现以下错误。

ValueError: Expected input batch_size (128) to match target batch_size (32).

The stack trace starts at the line loss = loss_fn(outputs, labels) . 堆栈跟踪从line loss = loss_fn(outputs, labels) The outputs.shape is [128, 500] and the labels.shape is [32] . 所述outputs.shape[128, 500]labels.shape[32]

The code is listed here for completeness. 为了完整起见,此处列出了代码。

class Unit(nn.Module):
    def __init__(self,in_channels,out_channels):
        super(Unit,self).__init__()
        self.conv = nn.Conv2d(in_channels=in_channels,kernel_size=3,out_channels=out_channels,stride=1,padding=1)
        self.bn = nn.BatchNorm2d(num_features=out_channels)
        self.relu = nn.ReLU()

    def forward(self,input):
        output = self.conv(input)
        output = self.bn(output)
        output = self.relu(output)
        return output

class SimpleNet(nn.Module):
    def __init__(self,num_classes=10):
        super(SimpleNet,self).__init__()

        self.unit1 = Unit(in_channels=3,out_channels=32)
        self.unit2 = Unit(in_channels=32, out_channels=32)
        self.unit3 = Unit(in_channels=32, out_channels=32)

        self.pool1 = nn.MaxPool2d(kernel_size=2)

        self.unit4 = Unit(in_channels=32, out_channels=64)
        self.unit5 = Unit(in_channels=64, out_channels=64)
        self.unit6 = Unit(in_channels=64, out_channels=64)
        self.unit7 = Unit(in_channels=64, out_channels=64)

        self.pool2 = nn.MaxPool2d(kernel_size=2)

        self.unit8 = Unit(in_channels=64, out_channels=128)
        self.unit9 = Unit(in_channels=128, out_channels=128)
        self.unit10 = Unit(in_channels=128, out_channels=128)
        self.unit11 = Unit(in_channels=128, out_channels=128)

        self.pool3 = nn.MaxPool2d(kernel_size=2)

        self.unit12 = Unit(in_channels=128, out_channels=128)
        self.unit13 = Unit(in_channels=128, out_channels=128)
        self.unit14 = Unit(in_channels=128, out_channels=128)

        self.avgpool = nn.AvgPool2d(kernel_size=4)

        self.net = nn.Sequential(self.unit1, self.unit2, self.unit3, self.pool1, self.unit4, self.unit5, self.unit6
                                 ,self.unit7, self.pool2, self.unit8, self.unit9, self.unit10, self.unit11, self.pool3,
                                 self.unit12, self.unit13, self.unit14, self.avgpool)

        self.fc = nn.Linear(in_features=128,out_features=num_classes)

    def forward(self, input):
        output = self.net(input)
        output = output.view(-1,128)
        output = self.fc(output)
        return output

Any ideas on how to modify this CNN to accept and properly return outputs? 关于如何修改此CNN以接受并正确返回输出的任何想法?

The problem is an incompatible reshape (view) at the end. 问题是最后的重塑(视图)不兼容。

You're using a sort of "flattening" at the end, which is different from a "global pooling". 最后,您使用的是一种“扁平化”,这与“全局池化”不同。 Both are valid for CNNs, but only the global poolings are compatible with any image size. 两者都对CNN有效,但只有全局池与任何图像大小兼容。

The flattened net (your case) 扁平网(您的情况)

In your case, with a flatten, you need to keep track of all image dimensions in order to know how to reshape at the end. 对于您的情况,使用展平,您需要跟踪所有图像尺寸,以便知道如何在最后进行整形。

So: 所以:

  • Enter with 64x64 以64x64输入
  • Pool1 to 32x32 池1至32x32
  • Pool2 to 16x16 池2至16x16
  • Pool3 to 8x8 3至8x8池
  • AvgPool to 2x2 AvgPool至2x2

Then, at the end you've got a shape of (batch, 128, 2, 2) . 然后,最后得到的形状为(batch, 128, 2, 2) Four times the final number if the image were 32x32. 如果图像为32x32,则为最终数字的四倍。

Then, your final reshape should be output = output.view(-1,128*2*2) . 然后,最后的重塑应为output = output.view(-1,128*2*2)

This is a different net with a different classification layer, though, because in_features=512 . 但是,这是一个具有不同分类层的不同网络,因为in_features=512

The global pooling net 全球池网

On the other hand, you could use the same model, same layers and same weights for any image size >= 32 if you replace the last pooling with a global pooling: 另一方面,如果将最后一个池替换为全局池,则对于大于等于32的任何图像大小,可以使用相同的模型,相同的层和相同的权重:

def flatChannels(x):
    size = x.size()
    return x.view(size[0],size[1],size[2]*size[3])

def globalAvgPool2D(x):        
    return flatChannels(x).mean(dim=-1)

def globalMaxPool2D(x):
    return flatChannels(x).max(dim=-1)

The ending of the model: 模型的结尾:

    #removed the pool from here to put it in forward
    self.net = nn.Sequential(self.unit1, self.unit2, self.unit3, self.pool1, self.unit4, 
                             self.unit5, self.unit6, self.unit7, self.pool2, self.unit8, 
                             self.unit9, self.unit10, self.unit11, self.pool3, 
                             self.unit12, self.unit13, self.unit14)

    self.fc = nn.Linear(in_features=128,out_features=num_classes)


def forward(self, input):
    output = self.net(input)
    output = globalAvgPool2D(output) #or globalMaxPool2D
    output = self.fc(output)
    return output

You need to use transforms module before trainig neural network (here is the link https://pytorch.org/docs/stable/torchvision/transforms.html ). 您需要在训练神经网络之前使用transforms模块(这里是链接https://pytorch.org/docs/stable/torchvision/transforms.html )。

You have a few options: 您有几种选择:

  1. transforms.Resize(32), transforms.Resize(32),

  2. transforms.ResizedCrop(32) - most preferable, because you can augment your data and prevent overfitting in some respect via this way. transforms.ResizedCrop(32)-最可取的是,因为您可以通过这种方式扩充数据并在某些方面防止过度拟合。

  3. transforms.CenterCrop(32), etc. transforms.CenterCrop(32)等

Moreover, you could compose transforms objects into one object via transforms.Compose). 此外,您可以通过transforms.Compose将对象转换为一个对象。

Enjoy. 请享用。

PS. PS。 Of course, you can refactor your Neural Network architecture, enabling it to take images of size 64 x 64. 当然,您可以重构神经网络体系结构,使其能够拍摄尺寸为64 x 64的图像。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何正确屏蔽卷积神经网络中的值 - How do I properly mask values in my Convolutional Neural Network 如何使用 OptionsDataset 构建卷积神经网络 - How do I build convolutional neural network using OptionsDataset 如何使用我自己的数据在 PyTorch 上测试这个卷积神经网络? - How can I use my own data to test this Convolutional Neural Network on PyTorch? 多 Output 卷积神经网络 - Multi Output Convolutional Neural Network 为什么卷积网络使用每64个图像进行训练? - Why does convolutional network use every 64 image for training? 卷积神经网络模型 - 为什么我在同一张图片上得到不同的结果 - Convolutional Neural Network Model - Why do I get different results on the same image 我如何将带有 openCV 的视频 stream 放入我的 pytorch neural.network? - How do i stream a video with openCV into my pytorch neural network? 如何在keras中可视化卷积神经网络中间层的输出? - How to visualize output of intermediate layers of convolutional neural network in keras? 对于使用数据增强进行图像分类的卷积神经网络,如何在 keras 中获得可重现的结果? - How can I get reproducible results in keras for a convolutional neural network using data augmentation for image classification? 如何使用Keras在二值图像上使用卷积神经网络? - How to use convolutional neural network on binary image using Keras?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM