简体   繁体   English

通过torch.ge求导,或者如何在pytorch中显式定义导数

[英]Taking a derivative through torch.ge, or how to explicitly define a derivative in pytorch

I am trying to set up a network in which one layer maps from real numbers to {0, 1} (ie makes output binary).我正在尝试建立一个网络,其中一层从实数映射到 {0, 1}(即使 output 二进制)。

What I tried我试过的

While I was able to find that torch.ge provides such functionality, whenever I want to train any parameter occurring before that layer in a network PyTorch breaks.虽然我能够发现torch.ge提供了这样的功能,但每当我想训练在网络 PyTorch 中断之前发生的任何参数时。

I have been also trying to find if there is any way in PyTorch/autograd, to override the derivative of a module by hand.我也一直在尝试寻找 PyTorch/autograd 中是否有任何方法可以手动覆盖模块的导数。 More specifically in this cause, I would just like to pass derivative through the torch.ge, without changing it.更具体地说,在这个原因中,我只想通过 torch.ge 传递导数,而不改变它。

Minimal Example最小的例子

Here is a minimal example I produced, which uses a typical neural network training structure in PyTorch.这是我制作的一个最小示例,它使用 PyTorch 中的典型神经网络训练结构。

import torch
import torch.nn as nn
import torch.optim as optim


class LinearGE(nn.Module):
    def __init__(self, features_in, features_out):
        super().__init__()
        self.fc = nn.Linear(features_in, features_out)

    def forward(self, x):
        return torch.ge(self.fc(x), 0)


x = torch.randn(size=(10, 30))
y = torch.randint(2, size=(10, 10))

# Define Model
m1 = LinearGE(30, 10)

opt = optim.SGD(m1.parameters(), lr=0.01)

crit = nn.MSELoss()

# Train Model
for x_batch, y_batch in zip(x, y):
    # zero the parameter gradients
    opt.zero_grad()

    # forward + backward + optimize
    pred = m1(x_batch)
    loss = crit(pred.float(), y_batch.float())
    loss.backward()
    opt.step()

What I encountered我遇到的

When I run the above code the following error occurs:当我运行上面的代码时,会发生以下错误:

File "__minimal.py", line 33, in <module>
    loss.backward()
...
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

This error makes sense since torch.ge function is not differentiable.这个错误是有道理的,因为torch.ge function 是不可微的。 However, since MaxPool2D is also not differentiable, I believe that there are ways of mitigating non-differentiability in PyTorch.但是,由于MaxPool2D也是不可微分的,我相信在 PyTorch 中存在减轻不可微分的方法。

It would be great if someone could point me to any source which can help me either implement my own backprop for a custom module, or any way of avoiding this error message.如果有人可以向我指出任何可以帮助我为自定义模块实现自己的反向传播或任何避免此错误消息的方式的来源,那就太好了。

Thanks!谢谢!

Two things I noticed我注意到的两件事

  1. If your input x is 10x30 (10 examples, 30 features)and the number of output node is 10, then the parameter matrix is 30x10.如果您的输入 x 为 10x30(10 个示例,30 个特征)并且 output 节点的数量为 10,则参数矩阵为 30x10。 The expected output matrix is 10x10 (10 examples 10 output nodes)预期的 output 矩阵为 10x10(10 个示例 10 个 output 节点)

  2. ge = greater than and equal to. ge = 大于等于。 As the code indicated, x >= 0 element wise.如代码所示, x >= 0 元素明智。 We can use relu.我们可以使用relu。

class LinearGE(nn.Module):
    def __init__(self, features_in, features_out):
        super().__init__()
        self.fc = nn.Linear(features_in, features_out)
        self.relu = nn.ReLU(inplace=True)

    def forward(self, x):
        return self.relu(self.fc(x))

or torch.maxtorch.max

torch.max(self.fc(x), 0)[0]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM