Pytorch CNN 的损失没有减少

Question

I'm doing a CNN with Pytorch for a task, but it won't learn and improve the accuracy.我正在用 Pytorch 做一个 CNN 来完成一项任务，但它不会学习和提高准确性。 I made a version working with the MNIST dataset so I could post it here.我制作了一个使用 MNIST 数据集的版本，所以我可以在这里发布它。 I'm just looking for an answer as to why it's not working.我只是在寻找为什么它不起作用的答案。 The architecture is fine, I implemented it in Keras and I had over 92% accuracy after 3 epochs.架构很好，我在 Keras 中实现了它，并且在 3 个 epoch 后我的准确率超过了 92%。 Note: I reshaped the MNIST into 60x60 pictures because that's how the pictures are in my "real" problem.注意：我将 MNIST 重塑为 60x60 图片，因为这就是图片在我的“真实”问题中的样子。

import numpy as np
from PIL import Image
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader
from torch.autograd import Variable
from keras.datasets import mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()


def resize(pics):
    pictures = []
    for image in pics:
        image = Image.fromarray(image).resize((dim, dim))
        image = np.array(image)
        pictures.append(image)
    return np.array(pictures)


dim = 60

x_train, x_test = resize(x_train), resize(x_test) # because my real problem is in 60x60

x_train = x_train.reshape(-1, 1, dim, dim).astype('float32') / 255
x_test = x_test.reshape(-1, 1, dim, dim).astype('float32') / 255
y_train, y_test = y_train.astype('float32'), y_test.astype('float32') 

if torch.cuda.is_available():
    x_train = torch.from_numpy(x_train)[:10_000]
    x_test = torch.from_numpy(x_test)[:4_000] 
    y_train = torch.from_numpy(y_train)[:10_000] 
    y_test = torch.from_numpy(y_test)[:4_000]


class ConvNet(nn.Module):

    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 32, 3)
        self.conv2 = nn.Conv2d(32, 64, 3)
        self.conv3 = nn.Conv2d(64, 128, 3)

        self.fc1 = nn.Linear(5*5*128, 1024) 
        self.fc2 = nn.Linear(1024, 2048)
        self.fc3 = nn.Linear(2048, 1)

    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        x = F.max_pool2d(F.relu(self.conv2(x)), (2, 2))
        x = F.max_pool2d(F.relu(self.conv3(x)), (2, 2))

        x = x.view(x.size(0), -1) 
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.dropout(x, 0.5)
        x = torch.sigmoid(self.fc3(x))
        return x


net = ConvNet()

optimizer = optim.Adam(net.parameters(), lr=0.03)

loss_function = nn.BCELoss()


class FaceTrain:

    def __init__(self):
        self.len = x_train.shape[0]
        self.x_train = x_train
        self.y_train = y_train

    def __getitem__(self, index):
        return x_train[index], y_train[index].unsqueeze(0)

    def __len__(self):
        return self.len


class FaceTest:

    def __init__(self):
        self.len = x_test.shape[0]
        self.x_test = x_test
        self.y_test = y_test

    def __getitem__(self, index):
        return x_test[index], y_test[index].unsqueeze(0)

    def __len__(self):
        return self.len


train = FaceTrain()
test = FaceTest()

train_loader = DataLoader(dataset=train, batch_size=64, shuffle=True)
test_loader = DataLoader(dataset=test, batch_size=64, shuffle=True)

epochs = 10
steps = 0
train_losses, test_losses = [], []
for e in range(epochs):
    running_loss = 0
    for images, labels in train_loader: 
        optimizer.zero_grad()
        log_ps = net(images)
        loss = loss_function(log_ps, labels)
        loss.backward()
        optimizer.step()        
        running_loss += loss.item()        
    else:
        test_loss = 0
        accuracy = 0        

        with torch.no_grad():
            for images, labels in test_loader: 
                log_ps = net(images)
                test_loss += loss_function(log_ps, labels)                
                ps = torch.exp(log_ps)
                top_p, top_class = ps.topk(1, dim=1)
                equals = top_class.type('torch.LongTensor') == labels.type(torch.LongTensor).view(*top_class.shape)
                accuracy += torch.mean(equals.type('torch.FloatTensor'))
        train_losses.append(running_loss/len(train_loader))
        test_losses.append(test_loss/len(test_loader))
        print("[Epoch: {}/{}] ".format(e+1, epochs),
              "[Training Loss: {:.3f}] ".format(running_loss/len(train_loader)),
              "[Test Loss: {:.3f}] ".format(test_loss/len(test_loader)),
              "[Test Accuracy: {:.3f}]".format(accuracy/len(test_loader)))

Answer 1

First the major issues...首先是主要问题...

1. The main issue with this code is that you're using the wrong output shape and the wrong loss function for classification. 1.此代码的主要问题是您使用错误的 output 形状和错误的损失 function 进行分类。

nn.BCELoss computes the binary cross entropy loss. nn.BCELoss计算二元交叉熵损失。 This is applicable when you have one or more targets which are either 0 or 1 (hence the binary).这适用于您有一个或多个目标为 0 或 1（因此是二进制）的情况。 In your case the target is a single integer between 0 and 9. Since there are only a small number of potential target values, the most common approach is to use categorical cross-entropy loss ( nn.CrossEntropyLoss ).在您的情况下，目标是介于 0 和 9 之间的单个 integer。由于只有少量潜在目标值，因此最常见的方法是使用分类交叉熵损失（ nn.CrossEntropyLoss ）。 The "theoretical" definition of cross entropy loss expects the network outputs and the targets to both be 10 dimensional vectors where the target is all zeros except in one location (one-hot encoded).交叉熵损失的“理论”定义期望网络输出和目标都是 10 维向量，其中目标除了在一个位置（单热编码）之外全为零。 However for computational stability and space efficiency reasons, pytorch's nn.CrossEntropyLoss directly takes the integer as a target .但是出于计算稳定性和空间效率的原因， pytorch 的nn.CrossEntropyLoss直接将 integer 作为目标。 However , you still need to provide it with a 10 dimensional output vector from your network.但是，您仍然需要为它提供来自您的网络的 10 维 output 向量。

# pseudo code (ignoring batch dimension)
loss = nn.functional.cross_entropy_loss(<output 10d vector>, <integer target>)

To fix this issue in your code we need to have fc3 output a 10 dimensional feature, and we need the labels to be integers (not floats).要在您的代码中解决此问题，我们需要fc3 output 一个 10 维特征，并且我们需要标签是整数（而不是浮点数）。 Also, there's no need to use .sigmoid on fc3 since pytorch's cross-entropy loss function internally applies log-softmax before computing the final loss value.此外，由于 pytorch 的交叉熵损失 function 在计算最终损失值之前在内部应用 log-softmax，因此无需在 fc3 上使用.sigmoid 。

2. As pointed out by Serget Dymchenko, you need to switch the network to eval mode during inference and train mode during train. 2.正如 Serget Dymchenko 所指出的，您需要在推理期间将网络切换到eval模式，在训练期间将网络切换到train模式。 This mainly affects dropout and batch_norm layers since they behave differently during training and inference.这主要影响 dropout 和 batch_norm 层，因为它们在训练和推理期间的行为不同。

3. A learning rate of 0.03 is probably a little too high. 3. 0.03 的学习率可能有点太高了。 It works just fine with a learning rate of 0.001 and in a couple experiments I saw the training diverge at 0.03.它在 0.001 的学习率下工作得很好，在几个实验中，我看到训练在 0.03 处出现分歧。

To accommodate these fixes a number of changes needed to be made.为了适应这些修复，需要进行一些更改。 The minimal corrections to the code are shown below.对代码的最小更正如下所示。 I commented any lines which were changed with #### followed by a short description of the change.我评论了所有用####更改的行，然后是更改的简短描述。

import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader
from torch.autograd import Variable
from keras.datasets import mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()


def resize(pics):
    pictures = []
    for image in pics:
        image = Image.fromarray(image).resize((dim, dim))
        image = np.array(image)
        pictures.append(image)
    return np.array(pictures)


dim = 60

x_train, x_test = resize(x_train), resize(x_test) # because my real problem is in 60x60

x_train = x_train.reshape(-1, 1, dim, dim).astype('float32') / 255
x_test = x_test.reshape(-1, 1, dim, dim).astype('float32') / 255
#### float32 -> int64
y_train, y_test = y_train.astype('int64'), y_test.astype('int64')

#### no reason to test for cuda before converting to numpy

#### I assume you were taking a subset for debugging? No reason to not use all the data
x_train = torch.from_numpy(x_train)
x_test = torch.from_numpy(x_test)
y_train = torch.from_numpy(y_train)
y_test = torch.from_numpy(y_test)


class ConvNet(nn.Module):

    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 32, 3)
        self.conv2 = nn.Conv2d(32, 64, 3)
        self.conv3 = nn.Conv2d(64, 128, 3)

        self.fc1 = nn.Linear(5*5*128, 1024)
        self.fc2 = nn.Linear(1024, 2048)
        #### 1 -> 10
        self.fc3 = nn.Linear(2048, 10)

    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        x = F.max_pool2d(F.relu(self.conv2(x)), (2, 2))
        x = F.max_pool2d(F.relu(self.conv3(x)), (2, 2))

        x = x.view(x.size(0), -1)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.dropout(x, 0.5)
        #### removed sigmoid
        x = self.fc3(x)
        return x


net = ConvNet()

#### 0.03 -> 1e-3
optimizer = optim.Adam(net.parameters(), lr=1e-3)

#### BCELoss -> CrossEntropyLoss
loss_function = nn.CrossEntropyLoss()


class FaceTrain:

    def __init__(self):
        self.len = x_train.shape[0]
        self.x_train = x_train
        self.y_train = y_train

    def __getitem__(self, index):
        #### .unsqueeze(0) removed
        return x_train[index], y_train[index]

    def __len__(self):
        return self.len


class FaceTest:

    def __init__(self):
        self.len = x_test.shape[0]
        self.x_test = x_test
        self.y_test = y_test

    def __getitem__(self, index):
        #### .unsqueeze(0) removed
        return x_test[index], y_test[index]

    def __len__(self):
        return self.len


train = FaceTrain()
test = FaceTest()

train_loader = DataLoader(dataset=train, batch_size=64, shuffle=True)
test_loader = DataLoader(dataset=test, batch_size=64, shuffle=True)

epochs = 10
steps = 0
train_losses, test_losses = [], []
for e in range(epochs):
    running_loss = 0
    #### put net in train mode
    net.train()
    for idx, (images, labels) in enumerate(train_loader):
        optimizer.zero_grad()
        log_ps = net(images)
        loss = loss_function(log_ps, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    else:
        test_loss = 0
        accuracy = 0

        #### put net in eval mode
        net.eval()
        with torch.no_grad():
            for images, labels in test_loader:
                log_ps = net(images)
                test_loss += loss_function(log_ps, labels)
                #### removed torch.exp() since exponential is monotone, taking it doesn't change the order of outputs. Similarly with torch.softmax()
                top_p, top_class = log_ps.topk(1, dim=1)
                #### convert to float/long using proper methods. what you have won't work for cuda tensors.
                equals = top_class.long() == labels.long().view(*top_class.shape)
                accuracy += torch.mean(equals.float())
        train_losses.append(running_loss/len(train_loader))
        test_losses.append(test_loss/len(test_loader))
        print("[Epoch: {}/{}] ".format(e+1, epochs),
              "[Training Loss: {:.3f}] ".format(running_loss/len(train_loader)),
              "[Test Loss: {:.3f}] ".format(test_loss/len(test_loader)),
              "[Test Accuracy: {:.3f}]".format(accuracy/len(test_loader)))

Results of training are now...训练结果现在...

[Epoch: 1/10]  [Training Loss: 0.139]  [Test Loss: 0.046]  [Test Accuracy: 0.986]
[Epoch: 2/10]  [Training Loss: 0.046]  [Test Loss: 0.042]  [Test Accuracy: 0.987]
[Epoch: 3/10]  [Training Loss: 0.031]  [Test Loss: 0.040]  [Test Accuracy: 0.988]
[Epoch: 4/10]  [Training Loss: 0.022]  [Test Loss: 0.029]  [Test Accuracy: 0.990]
[Epoch: 5/10]  [Training Loss: 0.017]  [Test Loss: 0.066]  [Test Accuracy: 0.987]
[Epoch: 6/10]  [Training Loss: 0.015]  [Test Loss: 0.056]  [Test Accuracy: 0.985]
[Epoch: 7/10]  [Training Loss: 0.018]  [Test Loss: 0.039]  [Test Accuracy: 0.991]
[Epoch: 8/10]  [Training Loss: 0.012]  [Test Loss: 0.057]  [Test Accuracy: 0.988]
[Epoch: 9/10]  [Training Loss: 0.012]  [Test Loss: 0.041]  [Test Accuracy: 0.991]
[Epoch: 10/10]  [Training Loss: 0.007]  [Test Loss: 0.048]  [Test Accuracy: 0.992]

Some other issues that will improve your performance and code.其他一些可以提高性能和代码的问题。

4. You're never moving the model to the GPU. 4.您永远不会将 model 移动到 GPU。 This means you won't be getting GPU acceleration.这意味着您不会获得 GPU 加速。

5. torchvision is designed with all the standard transforms and datasets and is built to be used with PyTorch. 5. torchvision设计有所有标准转换和数据集，可与 PyTorch 一起使用。 I recommend using it.我建议使用它。 This also removes the dependency on keras in your code.这也消除了代码中对 keras 的依赖。

6. Normalize your data by subtracting the mean and dividing by the standard deviation to improve performance of your network. 6.通过减去平均值并除以标准差来规范化您的数据，以提高网络的性能。 With torchvision you can use transforms.Normalize .使用 torchvision 您可以使用transforms.Normalize 。 This won't make a big difference in MNIST because its already too easy.这不会对 MNIST 产生太大影响，因为它已经太容易了。 But in more difficult problems it turns out to be important.但在更困难的问题中，它变得很重要。

Further improved code is show below (much faster on GPU).下面显示了进一步改进的代码（在 GPU 上要快得多）。

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision.datasets import MNIST
from torchvision import transforms

dim = 60

class ConvNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 32, 3)
        self.conv2 = nn.Conv2d(32, 64, 3)
        self.conv3 = nn.Conv2d(64, 128, 3)

        self.fc1 = nn.Linear(5 * 5 * 128, 1024)
        self.fc2 = nn.Linear(1024, 2048)
        self.fc3 = nn.Linear(2048, 10)

    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        x = F.max_pool2d(F.relu(self.conv2(x)), (2, 2))
        x = F.max_pool2d(F.relu(self.conv3(x)), (2, 2))

        x = x.view(x.size(0), -1)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.dropout(x, 0.5)
        x = self.fc3(x)
        return x


net = ConvNet()
if torch.cuda.is_available():
    net.cuda()

optimizer = optim.Adam(net.parameters(), lr=1e-3)

loss_function = nn.CrossEntropyLoss()

train_dataset = MNIST('./data', train=True, download=True,
                      transform=transforms.Compose([
                          transforms.Resize((dim, dim)),
                          transforms.ToTensor(),
                          transforms.Normalize((0.1307,), (0.3081,))
                      ]))
test_dataset = MNIST('./data', train=False, download=True,
                     transform=transforms.Compose([
                         transforms.Resize((dim, dim)),
                         transforms.ToTensor(),
                         transforms.Normalize((0.1307,), (0.3081,))
                     ]))

train_loader = DataLoader(dataset=train_dataset, batch_size=64, shuffle=True, num_workers=8)
test_loader = DataLoader(dataset=test_dataset, batch_size=64, shuffle=False, num_workers=8)

epochs = 10
steps = 0
train_losses, test_losses = [], []
for e in range(epochs):
    running_loss = 0
    net.train()
    for images, labels in train_loader:
        if torch.cuda.is_available():
            images, labels = images.cuda(), labels.cuda()
        optimizer.zero_grad()
        log_ps = net(images)
        loss = loss_function(log_ps, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    else:
        test_loss = 0
        accuracy = 0

        net.eval()
        with torch.no_grad():
            for images, labels in test_loader:
                if torch.cuda.is_available():
                    images, labels = images.cuda(), labels.cuda()
                log_ps = net(images)
                test_loss += loss_function(log_ps, labels)
                top_p, top_class = log_ps.topk(1, dim=1)
                equals = top_class.flatten().long() == labels
                accuracy += torch.mean(equals.float()).item()
        train_losses.append(running_loss/len(train_loader))
        test_losses.append(test_loss/len(test_loader))
        print("[Epoch: {}/{}] ".format(e+1, epochs),
              "[Training Loss: {:.3f}] ".format(running_loss/len(train_loader)),
              "[Test Loss: {:.3f}] ".format(test_loss/len(test_loader)),
              "[Test Accuracy: {:.3f}]".format(accuracy/len(test_loader)))

Updated results of training...训练结果更新...

[Epoch: 1/10]  [Training Loss: 0.125]  [Test Loss: 0.045]  [Test Accuracy: 0.987]
[Epoch: 2/10]  [Training Loss: 0.043]  [Test Loss: 0.031]  [Test Accuracy: 0.991]
[Epoch: 3/10]  [Training Loss: 0.030]  [Test Loss: 0.030]  [Test Accuracy: 0.991]
[Epoch: 4/10]  [Training Loss: 0.024]  [Test Loss: 0.046]  [Test Accuracy: 0.990]
[Epoch: 5/10]  [Training Loss: 0.020]  [Test Loss: 0.032]  [Test Accuracy: 0.992]
[Epoch: 6/10]  [Training Loss: 0.017]  [Test Loss: 0.046]  [Test Accuracy: 0.991]
[Epoch: 7/10]  [Training Loss: 0.015]  [Test Loss: 0.034]  [Test Accuracy: 0.992]
[Epoch: 8/10]  [Training Loss: 0.011]  [Test Loss: 0.048]  [Test Accuracy: 0.992]
[Epoch: 9/10]  [Training Loss: 0.012]  [Test Loss: 0.037]  [Test Accuracy: 0.991]
[Epoch: 10/10]  [Training Loss: 0.013]  [Test Loss: 0.038]  [Test Accuracy: 0.992]

Answer 2

One thing I noticed that you test the model in train mode.我注意到您在训练模式下测试 model 的一件事。 You need to call net.eval() to disable dropouts (and then net.train() again to put it back in the train mode).您需要调用net.eval()来禁用辍学（然后再次调用 net.train( net.train()将其重新置于训练模式）。

Maybe there are other issues.也许还有其他问题。 Is training loss going down?训练损失会下降吗？ Have you tried to overfit on a single example?您是否尝试过对单个示例进行过度拟合？

Pytorch CNN 的损失没有减少

问题描述

2 个解决方案

解决方案1
8 已采纳 2019-11-02 06:10:23

解决方案2
4 2019-11-02 01:31:45

Pytorch CNN 的损失没有减少

问题描述

2 个解决方案

解决方案1 8 已采纳 2019-11-02 06:10:23

解决方案2 4 2019-11-02 01:31:45

解决方案1
8 已采纳 2019-11-02 06:10:23

解决方案2
4 2019-11-02 01:31:45