为什么 CNN model 的损失在整个 epoch 中变化不大？

Question

I am running a deep learning project about visual speech recognition tasks, and I found a strange phenomenon.我正在运行一个关于视觉语音识别任务的深度学习项目，我发现了一个奇怪的现象。 In the first several epochs, the loss can decrease at a normal speed.在前几个时期，损失可以以正常速度减少。 I mean, in a epoch, the loss would decrease with the increase of the number of iterations.我的意思是，在一个时代，损失会随着迭代次数的增加而减少。 While in the later epochs, the loss almost unchanged in the whole epoch, but would decrease at the start of the next epoch.而在后期的 epoch 中，损失在整个 epoch 中几乎没有变化，但在下一个 epoch 开始时会减少。

Sometimes I interrupted the running code after an epoch finishing and restart from the trained weights.有时我会在一个 epoch 结束后中断正在运行的代码，然后从训练好的权重重新开始。 The loss would decrease at the start of the next epoch, too.损失也会在下一个 epoch 开始时减少。

This is the training code:这是训练代码：

    for epoch in range(283,args.epochs):

        model.train()
        running_loss, running_corrects, running_all, cer = 0., 0., 0., 0.

        for batch_idx, sample_batched in enumerate(dset_loaders['train']):
            optimizer.zero_grad()
            inputs,targets,lengths,y_lengths,idx = sample_batched
            inputs = inputs.float()     
            inputs, targets = inputs.to(device) , targets.to(device) 
            outputs = model(inputs)  
            loss = criterion(F.log_softmax(outputs,dim=-1),targets,lengths,y_lengths)
            loss.backward()
            optimizer.step()

            decoded = decoder.decode_greedy(outputs,lengths)
            cursor, gt = 0, []
            for b in range(inputs.size(0)):
                y_str = ''.join([vocabularies[ch] for ch in targets[cursor: cursor + y_lengths[b]]])
                gt.append(y_str)
                cursor += y_lengths[b]
            CER = decoder.cer_batch(decoded,gt)
            cer += CER
            cer_mean = cer/(batch_idx+1)

            running_loss += loss.data * inputs.size(0)
            running_all += len(inputs)
            if batch_idx == 0:
                since = time.time()      
            else (batch_idx+1) % args.interval == 0 or (batch_idx == len(dset_loaders['train'])-1):            
                print('Process: [{:5.0f}/{:5.0f} ({:.0f}%)]\tLoss: {:.4f}\tcer:{:.4f}\tCost time:{:5.0f}s\tEstimated time:{:5.0f}s\t'.format(
                    running_all,
                    len(dset_loaders['train'].dataset),
                    100. * batch_idx / (len(dset_loaders['train'])-1),
                    running_loss / running_all,
                    cer_mean,
                    time.time()-since,
                    (time.time()-since)*(len(dset_loaders['train'])-1) / batch_idx - (time.time()-since)))
        print('{} Epoch:\t{:2}\tLoss: {:.4f}\tcer:{:.4f}\t'.format(
            'pretrain',
            epoch,
            running_loss / len(dset_loaders['train'].dataset),
            cer_mean)+'\n')
        torch.save(model.state_dict(), save_path+'/'+args.mode+'_'+str(epoch+1)+'.pt')

I get very confused about this phenomenon.我对这种现象感到非常困惑。 I think if the Loss hasn't changed in the whole epoch, the Loss in the next epoch shouldn't have changed either.我认为如果整个时期的损失没有改变，那么下一个时期的损失也不应该改变。 Why the Loss still change at the beginning of the next epoch after unchanged in the whole epoch?为什么在整个 epoch 不变之后，Loss 在下一个 epoch 开始时仍然变化？ Can someone help me solve this problem?有人可以帮我解决这个问题吗？ Thanks!谢谢！

Answer 1

I think it may be related to how you print the loss.我认为这可能与您如何打印损失有关。

You have a running_loss that denotes the total loss for every data point calculated in this epoch and a running_all demote the total number of data points calculated in this epoch.您有一个running_loss表示在此 epoch 中计算的每个数据点的总损失，而running_all将在此 epoch 中计算的数据点总数降级。 You print the running_loss / running_all which is the average loss for every data point in this epoch.您打印running_loss / running_all ，这是该时期每个数据点的平均损失。

As more data point is collected, even if the loss is steadily decreasing, the new loss is averaged with a larger number of the previously computed loss, which make the decreasing seems slower.随着收集到的数据点越来越多，即使损失稳步下降，新的损失也会与之前计算的损失的更大数量进行平均，这使得下降看起来更慢。 Explained here: https://gist.github.com/valkjsaaa/b0b26075174a87b3fd302b4b52ab035a在这里解释： https://gist.github.com/valkjsaaa/b0b26075174a87b3fd302b4b52ab035a

I would suggest replace running_loss / running_all with loss.data / len(inputs) , which is the loss for this current batch and see if it would help.我建议将running_loss / running_all替换为loss.data / len(inputs) ，这是当前批次的损失，看看它是否有帮助。

The changed code should look like the following:更改后的代码应如下所示：

   for epoch in range(283,args.epochs):

        model.train()
        running_loss, running_corrects, running_all, cer = 0., 0., 0., 0.

        for batch_idx, sample_batched in enumerate(dset_loaders['train']):
            optimizer.zero_grad()
            inputs,targets,lengths,y_lengths,idx = sample_batched
            inputs = inputs.float()     
            inputs, targets = inputs.to(device) , targets.to(device) 
            outputs = model(inputs)  
            loss = criterion(F.log_softmax(outputs,dim=-1),targets,lengths,y_lengths)
            loss.backward()
            optimizer.step()

            decoded = decoder.decode_greedy(outputs,lengths)
            cursor, gt = 0, []
            for b in range(inputs.size(0)):
                y_str = ''.join([vocabularies[ch] for ch in targets[cursor: cursor + y_lengths[b]]])
                gt.append(y_str)
                cursor += y_lengths[b]
            CER = decoder.cer_batch(decoded,gt)
            cer += CER
            cer_mean = cer/(batch_idx+1)

            running_loss += loss.data * inputs.size(0)
            running_all += len(inputs)
            if batch_idx == 0:
                since = time.time()      
            else (batch_idx+1) % args.interval == 0 or (batch_idx == len(dset_loaders['train'])-1):            
                print('Process: [{:5.0f}/{:5.0f} ({:.0f}%)]\tLoss: {:.4f}\tcer:{:.4f}\tCost time:{:5.0f}s\tEstimated time:{:5.0f}s\t'.format(
                    running_all,
                    len(dset_loaders['train'].dataset),
                    100. * batch_idx / (len(dset_loaders['train'])-1),
                    loss.data,
                    cer_mean,
                    time.time()-since,
                    (time.time()-since)*(len(dset_loaders['train'])-1) / batch_idx - (time.time()-since)))
        print('{} Epoch:\t{:2}\tLoss: {:.4f}\tcer:{:.4f}\t'.format(
            'pretrain',
            epoch,
            running_loss / len(dset_loaders['train'].dataset),
            cer_mean)+'\n')
        torch.save(model.state_dict(), save_path+'/'+args.mode+'_'+str(epoch+1)+'.pt')

为什么 CNN model 的损失在整个 epoch 中变化不大？

问题描述

1 个解决方案

解决方案1
0 2019-10-28 03:29:55

为什么 CNN model 的损失在整个 epoch 中变化不大？

问题描述

1 个解决方案

解决方案1 0 2019-10-28 03:29:55

解决方案1
0 2019-10-28 03:29:55