DequantizeAndLinearBackward 的 Output 0 是一个视图，正在就地修改。此视图是在自定义 Function 和自动网格中创建的

Question

I am trying to fine-tune GPT J, but I have this error.我正在尝试微调 GPT J，但出现此错误。 I think it's related to the activation function and it's in-place but I don't know how to code it to fix it.我认为它与激活 function 有关并且它就位但我不知道如何对其进行编码以修复它。

Is it a parameter inside the activation function that needs to be disabled?是激活里面的参数function需要禁用吗？ If yes, which one?如果有，是哪一个？

Thank you for your help in advance!提前谢谢你的帮助！

 output = DequantizeAndLinear.apply(input, self.weight, self.absmax, self.code, self.bias)
     14         if self.adapter:
---> 15             output += self.adapter(input)
     16         return output
     17 

RuntimeError: Output 0 of DequantizeAndLinearBackward is a view and is being modified in-place. This view was created inside a custom Function (or because an input was returned as-is) and the autograd logic to handle view+inplace would override the custom backward associated with the custom Function, leading to incorrect gradients. This behavior is forbidden. You can fix this by cloning the output of the custom Function.

   def forward(self, input):
        output = DequantizeAndLinear.apply(input, self.weight, self.absmax, self.code, self.bias)
        if self.adapter:
            output += self.adapter(input)
        return output
 
    @classmethod
    def from_linear(cls, linear: nn.Linear) -> "FrozenBNBLinear":
        weights_int8, state = quantize_blockise_lowmemory(linear.weight)
        return cls(weights_int8, *state, linear.bias)
 
    def __repr__(self):
        return f"{self.__class__.__name__}({self.in_features}, {self.out_features})"
 
 
class DequantizeAndLinear(torch.autograd.Function): 
    @staticmethod
    @custom_fwd
    def forward(ctx, input: torch.Tensor, weights_quantized: torch.ByteTensor,
                absmax: torch.FloatTensor, code: torch.FloatTensor, bias: torch.FloatTensor):
        weights_deq = dequantize_blockwise(weights_quantized, absmax=absmax, code=code)
        ctx.save_for_backward(input, weights_quantized, absmax, code)
        ctx._has_bias = bias is not None
        return F.linear(input, weights_deq, bias)
 
    @staticmethod
    @custom_bwd
    def backward(ctx, grad_output: torch.Tensor):
        assert not ctx.needs_input_grad[1] and not ctx.needs_input_grad[2] and not ctx.needs_input_grad[3]
        input, weights_quantized, absmax, code = ctx.saved_tensors
        # grad_output: [*batch, out_features]
        weights_deq = dequantize_blockwise(weights_quantized, absmax=absmax, code=code)
        grad_input = grad_output @ weights_deq
        grad_bias = grad_output.flatten(0, -2).sum(dim=0) if ctx._has_bias else None
        return grad_input, None, None, None, grad_bias

Answer 1

You just have to add.clone() to your activation function. Here, it was F.linear(input, weights_deq, bias).clone()您只需将 add.clone() 添加到您的激活 function 中。这里是 F.linear(input, weights_deq, bias).clone()

DequantizeAndLinearBackward 的 Output 0 是一个视图，正在就地修改。此视图是在自定义 Function 和自动网格中创建的

问题描述

1 个解决方案

解决方案1
0 2023-01-20 08:53:29

DequantizeAndLinearBackward 的 Output 0 是一个视图，正在就地修改。 此视图是在自定义 Function 和自动网格中创建的

问题描述

1 个解决方案

解决方案1 0 2023-01-20 08:53:29

DequantizeAndLinearBackward 的 Output 0 是一个视图，正在就地修改。此视图是在自定义 Function 和自动网格中创建的

解决方案1
0 2023-01-20 08:53:29