简体   繁体   English

PyTorch:损失保持不变

[英]PyTorch: Loss remains constant

I've written a code in PyTorch with my own implemented loss function focal_loss_fixed . 我已经使用自己实现的损失函数focal_loss_fixed在PyTorch中编写了代码。 But my loss value stays fixed after every epoch. 但是我的损失价值在每个时期都保持不变。 Looks like weights are not being updated. 权重似乎没有被更新。 Here is my code snippet: 这是我的代码段:

optimizer = optim.SGD(net.parameters(),
                          lr=lr,
                          momentum=0.9,
                          weight_decay=0.0005)


for epoch in T(range(20)):
    net.train()
    epoch_loss = 0
    for n in range(len(x_train)//batch_size):
        (imgs, true_masks) = data_gen_small(x_train, y_train, iter_num=n, batch_size=batch_size)
        temp = []
        for tt in true_masks:
            temp.append(tt.reshape(128, 128, 1))
        true_masks = np.copy(np.array(temp))
        del temp
        imgs = np.swapaxes(imgs, 1,3)
        imgs = torch.from_numpy(imgs).float().cuda()
        true_masks = torch.from_numpy(true_masks).float().cuda()
        masks_pred = net(imgs)
        masks_probs = F.sigmoid(masks_pred)
        masks_probs_flat = masks_probs.view(-1)
        true_masks_flat = true_masks.view(-1)
        print((focal_loss_fixed(tf.convert_to_tensor(true_masks_flat.data.cpu().numpy()), tf.convert_to_tensor(masks_probs_flat.data.cpu().numpy()))))
        loss = torch.from_numpy(np.array(focal_loss_fixed(tf.convert_to_tensor(true_masks_flat.data.cpu().numpy()), tf.convert_to_tensor(masks_probs_flat.data.cpu().numpy())))).float().cuda()
        loss = Variable(loss.data, requires_grad=True)
        epoch_loss *= (n/(n+1))
        epoch_loss += loss.item()*(1/(n+1))
        print('Step: {0:.2f}% --- loss: {1:.6f}'.format(n * batch_size* 100.0 / len(x_train), epoch_loss), end='\r')
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    print('Epoch finished ! Loss: {}'.format(epoch_loss))

And this is my `focal_loss_fixed' function: 这是我的“ focal_loss_fixed”函数:

def focal_loss_fixed(true_data, pred_data):
    gamma=2.
    alpha=.25
    eps = 1e-7
    # print(type(y_true), type(y_pred))
    pred_data = K.clip(pred_data,eps,1-eps)
    pt_1 = tf.where(tf.equal(true_data, 1), pred_data, tf.ones_like(pred_data))
    pt_0 = tf.where(tf.equal(true_data, 0), pred_data, tf.zeros_like(pred_data))
    with tf.Session() as sess:
        return sess.run(-K.sum(alpha * K.pow(1. - pt_1, gamma) * K.log(pt_1))-K.sum((1-alpha) * K.pow( pt_0, gamma) * K.log(1. - pt_0)))

After each epoch the loss value stays constant( 5589.60328 ). 在每个时期之后,损耗值保持恒定( 5589.60328 )。 What's wrong with it? 它出什么问题了?

I think the problem lies in your heavy weight decay. 我认为问题出在您的体重下降。

Essentially, you are not reducing the weight by x , but rather you multiply the weights by x , which means that you are instantaneously only doing very small increments, leading to a (seemingly) plateauing loss function. 从本质上讲,您并不是 将权重乘以 x ,而是将权重乘以x ,这意味着您瞬时只做很小的增量,从而导致(看似)平稳的损失函数。

More explanation on this can be found in the PyTorch discussion forum (eg, here , or here ). 可以在PyTorch讨论论坛(例如herehere )中找到有关此问题的更多说明。
Unfortunately, the source for SGD alone also does not tell you much about its implementation. 不幸的是,仅SGD的来源也无法告诉您有关其实施的更多信息。 Simply setting it to a larger value should result in better updates. 只需将其设置为较大的值,即可获得更好的更新。 You can start by leaving it out completely, and then iteratively reducing it (from 1.0 ), until you get more decent results. 您可以先将其完全忽略掉,然后迭代地减少它(从1.0 ),直到获得更好的结果。

When computing the loss you call focal_loss_fixed() which uses TensorFlow to compute the loss value. 在计算损失时,您可以调用focal_loss_fixed() ,它使用TensorFlow计算损失值。 focal_loss_fixed() creates a graph and runs it in a session to get the value, and by this point PyTorch has no idea of the sequence of operations that led to the loss because they were computed by the TensorFlow backend. focal_loss_fixed()创建一个图形并在会话中运行它以获取值,并且到这一点为止,PyTorch不知道导致损失的操作顺序,因为它们是由TensorFlow后端计算的。 It is likely then, that all PyTorch sees in loss is a constant, as if you had written 那么,PyTorch看到的所有loss很可能都是一个常数,就好像您写过

loss = 3

So the gradient will be zero, and the parameters will never be updated. 因此,梯度将为零,并且参数将永远不会更新。 I suggest you rewrite your loss function using PyTorch operations so that the gradient with respect to its inputs can be computed. 我建议您使用PyTorch操作重写损失函数,以便可以计算相对于其输入的梯度。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM