How pytorch implement forward for a quantized linear layer?

Question

I have a quantized model in pytorch and now I want to extract the parameter of the quantized linear layer and implement the forward manually. I search the source code but only find this function.

def forward(self, x: torch.Tensor) -> torch.Tensor:
    return torch.ops.quantized.linear(
    x, self._packed_params._packed_params, self.scale, self.zero_point)

But no where I can find how torch.ops.quantized.linear is defined.

Can someone give me a hind how the forward of quantized linear are defined?

Answer 1

In answer to the question of where torch.ops.quantized.linear , I was looking for the same thing but was never able to find it. I believe it's probably somewhere in the aten (C++ namespace). I did, however, find some useful PyTorch-based implementations in the NVIDIA TensorRT repo below. It's quite possible these are the ones actually called by PyTorch via some DLLs. If you're trying to add quantization to a custom layer, these implementations walk you through it.

You can find the docs here and the GitHub page here .

For the linear layer specifically, see the QuantLinear layer here

Under the hood, this calls TensorQuantFunction.apply() for post-training quantization or FakeTensorQuantFunction.apply() for quantization-aware training.

How pytorch implement forward for a quantized linear layer?

Question

1 answers

solution1
0 2022-08-25 19:27:29

How pytorch implement forward for a quantized linear layer?

Question

1 answers

solution1 0 2022-08-25 19:27:29

solution1
0 2022-08-25 19:27:29