简体   繁体   English

来自 2 个列表的差异的可微分损失

[英]Differentiable loss from the difference of 2 lists

I have a model我有一个 model

import torch
from torch.autograd import grad
import torch.nn as nn
import torch.optim as optim

class net_x(nn.Module): 
        def __init__(self):
            super(net_x, self).__init__()
            self.fc1=nn.Linear(1, 1) 
            self.out=nn.Linear(1, 1) 

        def forward(self, x):
            x=self.fc1(x)
            x=self.out(x)
            return x

nx = net_x()
#inputs
t = torch.tensor([1.0, 2.0, 3.2], requires_grad = True) #input vector
t = torch.reshape(t, (3,1)) #reshape for batch

and 2 lists:和 2 个列表:

pred_lst = [] 
goal_lst = list(range(10))

I was trying to get the loss between these two lists as follows:我试图在这两个列表之间获得如下损失:

for epoch in range(10):
    optimizer.zero_grad()
    y = nx(t)
    if torch.sum(y) > 5:
        pred_lst.append(epoch) 
    else:
        pass
    loss = len(set(pred_lst).symmetric_difference(set(goal_lst)))
    loss = torch.tensor(float(loss), requires_grad = True)
    print('loss: ', loss)
    loss.backward()

But the parameters were not updating because symmetric_difference is a non differentiable operation.但是参数没有更新,因为symmetric_difference是不可微的操作。 How can I modify/use something else that will take these 2 lists and give me a differentiable loss that I can backpropagate?我如何修改/使用其他东西来获取这两个列表并给我一个可以反向传播的可微损失?

Questions of non-differentiability aside, here's how you'd do it with K-hot encoding and L1:抛开不可微性问题不谈,下面是你如何使用 K-hot 编码和 L1 来做到这一点:

label_vec = torch.zeros(100).float()
label_vec[goal_lst] = 1

pred_vec = torch.zeros(100).float()
pred_vec[pred_lst] = 1

loss = torch.nn.L1Loss(pred_vec,label_vec)

But the indexing operations are non-differentiable with respect to the indices I believe.但是索引操作相对于我相信的索引是不可微的。

It seems that to solve this issue one solution would be to have your NN natively output a vector ( pred_vec ) rather than a list.似乎要解决此问题,一种解决方案是让您的 NN 本身为 output 向量( pred_vec )而不是列表。 Furthermore, this vector should likely contain values in the range [0,1] so that the gradient contains meaningful information.此外,该向量应该可能包含 [0,1] 范围内的值,以便梯度包含有意义的信息。 Standard practice would be to output continuous predictions during training, and use some cuttoff to determine the list of final discrete outputs for inference.标准做法是在训练期间对 output 连续预测,并使用一些截止值来确定最终离散输出列表以进行推理。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM