[英]How do I modify this PyTorch convolutional neural network to accept a 64 x 64 image and properly output predictions?
I took this convolutional neural network (CNN) from here . 我从这里开始使用这个卷积神经网络(CNN)。 It accepts 32 x 32 images and defaults to 10 classes.
它接受32 x 32图像,默认为10类。 However, I have 64 x 64 images with 500 classes.
但是,我有500个类的64 x 64图像。 When I pass in 64 x 64 images (batch size held constant at 32), I get the following error.
当我传递64 x 64图像(批量大小保持恒定为32)时,出现以下错误。
ValueError: Expected input batch_size (128) to match target batch_size (32).
The stack trace starts at the line loss = loss_fn(outputs, labels)
. 堆栈跟踪从line
loss = loss_fn(outputs, labels)
。 The outputs.shape
is [128, 500]
and the labels.shape
is [32]
. 所述
outputs.shape
是[128, 500]
和labels.shape
是[32]
。
The code is listed here for completeness. 为了完整起见,此处列出了代码。
class Unit(nn.Module):
def __init__(self,in_channels,out_channels):
super(Unit,self).__init__()
self.conv = nn.Conv2d(in_channels=in_channels,kernel_size=3,out_channels=out_channels,stride=1,padding=1)
self.bn = nn.BatchNorm2d(num_features=out_channels)
self.relu = nn.ReLU()
def forward(self,input):
output = self.conv(input)
output = self.bn(output)
output = self.relu(output)
return output
class SimpleNet(nn.Module):
def __init__(self,num_classes=10):
super(SimpleNet,self).__init__()
self.unit1 = Unit(in_channels=3,out_channels=32)
self.unit2 = Unit(in_channels=32, out_channels=32)
self.unit3 = Unit(in_channels=32, out_channels=32)
self.pool1 = nn.MaxPool2d(kernel_size=2)
self.unit4 = Unit(in_channels=32, out_channels=64)
self.unit5 = Unit(in_channels=64, out_channels=64)
self.unit6 = Unit(in_channels=64, out_channels=64)
self.unit7 = Unit(in_channels=64, out_channels=64)
self.pool2 = nn.MaxPool2d(kernel_size=2)
self.unit8 = Unit(in_channels=64, out_channels=128)
self.unit9 = Unit(in_channels=128, out_channels=128)
self.unit10 = Unit(in_channels=128, out_channels=128)
self.unit11 = Unit(in_channels=128, out_channels=128)
self.pool3 = nn.MaxPool2d(kernel_size=2)
self.unit12 = Unit(in_channels=128, out_channels=128)
self.unit13 = Unit(in_channels=128, out_channels=128)
self.unit14 = Unit(in_channels=128, out_channels=128)
self.avgpool = nn.AvgPool2d(kernel_size=4)
self.net = nn.Sequential(self.unit1, self.unit2, self.unit3, self.pool1, self.unit4, self.unit5, self.unit6
,self.unit7, self.pool2, self.unit8, self.unit9, self.unit10, self.unit11, self.pool3,
self.unit12, self.unit13, self.unit14, self.avgpool)
self.fc = nn.Linear(in_features=128,out_features=num_classes)
def forward(self, input):
output = self.net(input)
output = output.view(-1,128)
output = self.fc(output)
return output
Any ideas on how to modify this CNN to accept and properly return outputs? 关于如何修改此CNN以接受并正确返回输出的任何想法?
The problem is an incompatible reshape (view) at the end. 问题是最后的重塑(视图)不兼容。
You're using a sort of "flattening" at the end, which is different from a "global pooling". 最后,您使用的是一种“扁平化”,这与“全局池化”不同。 Both are valid for CNNs, but only the global poolings are compatible with any image size.
两者都对CNN有效,但只有全局池与任何图像大小兼容。
In your case, with a flatten, you need to keep track of all image dimensions in order to know how to reshape at the end. 对于您的情况,使用展平,您需要跟踪所有图像尺寸,以便知道如何在最后进行整形。
So: 所以:
Then, at the end you've got a shape of (batch, 128, 2, 2)
. 然后,最后得到的形状为
(batch, 128, 2, 2)
。 Four times the final number if the image were 32x32. 如果图像为32x32,则为最终数字的四倍。
Then, your final reshape should be output = output.view(-1,128*2*2)
. 然后,最后的重塑应为
output = output.view(-1,128*2*2)
。
This is a different net with a different classification layer, though, because in_features=512
. 但是,这是一个具有不同分类层的不同网络,因为
in_features=512
。
On the other hand, you could use the same model, same layers and same weights for any image size >= 32 if you replace the last pooling with a global pooling: 另一方面,如果将最后一个池替换为全局池,则对于大于等于32的任何图像大小,可以使用相同的模型,相同的层和相同的权重:
def flatChannels(x):
size = x.size()
return x.view(size[0],size[1],size[2]*size[3])
def globalAvgPool2D(x):
return flatChannels(x).mean(dim=-1)
def globalMaxPool2D(x):
return flatChannels(x).max(dim=-1)
The ending of the model: 模型的结尾:
#removed the pool from here to put it in forward
self.net = nn.Sequential(self.unit1, self.unit2, self.unit3, self.pool1, self.unit4,
self.unit5, self.unit6, self.unit7, self.pool2, self.unit8,
self.unit9, self.unit10, self.unit11, self.pool3,
self.unit12, self.unit13, self.unit14)
self.fc = nn.Linear(in_features=128,out_features=num_classes)
def forward(self, input):
output = self.net(input)
output = globalAvgPool2D(output) #or globalMaxPool2D
output = self.fc(output)
return output
You need to use transforms module before trainig neural network (here is the link https://pytorch.org/docs/stable/torchvision/transforms.html ). 您需要在训练神经网络之前使用transforms模块(这里是链接https://pytorch.org/docs/stable/torchvision/transforms.html )。
You have a few options: 您有几种选择:
transforms.Resize(32), transforms.Resize(32),
transforms.ResizedCrop(32) - most preferable, because you can augment your data and prevent overfitting in some respect via this way. transforms.ResizedCrop(32)-最可取的是,因为您可以通过这种方式扩充数据并在某些方面防止过度拟合。
transforms.CenterCrop(32), etc. transforms.CenterCrop(32)等
Moreover, you could compose transforms objects into one object via transforms.Compose). 此外,您可以通过transforms.Compose将对象转换为一个对象。
Enjoy. 请享用。
PS. PS。 Of course, you can refactor your Neural Network architecture, enabling it to take images of size 64 x 64.
当然,您可以重构神经网络体系结构,使其能够拍摄尺寸为64 x 64的图像。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.