简体   繁体   中英

How pytorch implement forward for a quantized linear layer?

I have a quantized model in pytorch and now I want to extract the parameter of the quantized linear layer and implement the forward manually. I search the source code but only find this function.

def forward(self, x: torch.Tensor) -> torch.Tensor:
    return torch.ops.quantized.linear(
    x, self._packed_params._packed_params, self.scale, self.zero_point)

But no where I can find how torch.ops.quantized.linear is defined.

Can someone give me a hind how the forward of quantized linear are defined?

In answer to the question of where torch.ops.quantized.linear , I was looking for the same thing but was never able to find it. I believe it's probably somewhere in the aten (C++ namespace). I did, however, find some useful PyTorch-based implementations in the NVIDIA TensorRT repo below. It's quite possible these are the ones actually called by PyTorch via some DLLs. If you're trying to add quantization to a custom layer, these implementations walk you through it.

You can find the docs here and the GitHub page here .

For the linear layer specifically, see the QuantLinear layer here

Under the hood, this calls TensorQuantFunction.apply() for post-training quantization or FakeTensorQuantFunction.apply() for quantization-aware training.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM