简体   繁体   中英

RuntimeError: size mismatch, m1: [32 x 1], m2: [32 x 9]

I'm building a CNN and training it on hand sign gesture classification for letters A through I (9 classes), each image is RGB with 224x224 size.

Not sure which matrix I need to transpose and how. I have managed to match the inputs and outputs of layers, but that matrix multiplication thing, not really sure how to fix it.

class LargeNet(nn.Module):
    def __init__(self):
        super(LargeNet, self).__init__()
        self.name = "large"
        self.conv1 = nn.Conv2d(3, 5, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(5, 10, 5)
        self.fc1 = nn.Linear(10 * 53 * 53, 32)
        self.fc2 = nn.Linear(32, 9)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        print('x1')
        x = self.pool(F.relu(self.conv2(x)))
        print('x2')
        x = x.view(-1, 10*53*53)
        print('x3')
        x = F.relu(self.fc1(x))
        print('x4')
        x = x.view(-1, 1)
        x = self.fc2(x)
        print('x5')
        x = x.squeeze(1) # Flatten to [batch_size]
        return x

and training code

#Loss and optimizer
criterion = nn.BCEWithLogitsLoss()
optimizer = optim.SGD(model2.parameters(), lr=learning_rate, momentum=0.9)

# Train the model
total_step = len(train_loader)
loss_list = []
acc_list = []
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        print(i,images.size(),labels.size())
        # Run the forward pass
        outputs = model2(images)
        labels=labels.unsqueeze(1)
        labels=labels.float()
        loss = criterion(outputs, labels)

The code prints up to x4 and then I get this error RuntimeError: size mismatch, m1: [32 x 1], m2: [32 x 9] at C:\w\1\s\tmp_conda_3.7_055457\conda\conda-bld\pytorch_1565416617654\work\aten\src\TH/generic/THTensorMath.cpp:752

Complete traceback error: https://ibb.co/ykqy5wM

You don't need x=x.view(-1,1) and x = x.squeeze(1) in your forward function. Remove these two lines. Your output shape would be (batch_size, 9) .

Also, you need to convert labels to one-hot encoding, which is in shape of (batch_size, 9) .

class LargeNet(nn.Module):
    def __init__(self):
        super(LargeNet, self).__init__()
        self.name = "large"
        self.conv1 = nn.Conv2d(3, 5, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(5, 10, 5)
        self.fc1 = nn.Linear(10 * 53 * 53, 32)
        self.fc2 = nn.Linear(32, 9)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 10*53*53)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

model2 = LargeNet()
#Loss and optimizer
criterion = nn.BCEWithLogitsLoss()
# nn.BCELoss()
optimizer = optim.SGD(model2.parameters(), lr=0.1, momentum=0.9)

images = torch.from_numpy(np.random.randn(2,3,224,224)).float() # fake images, batch_size is 2
labels = torch.tensor([1,2]).long() # fake labels

outputs = model2(images)
one_hot_labels = torch.eye(9)[labels] 
loss = criterion(outputs, one_hot_labels)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM