简体繁体 English

张量流动态范围量化

[英]tensorflow dynamic range quantization

原文 2021-07-25 20:01:07 6 1 tensorflow/ tensorflow2.0/ tensorflow-lite/ quantization-aware-training

The tensorflow documentation for dynamic range quantization states that:动态范围量化的 tensorflow 文档指出：

At inference, weights are converted from 8-bits of precision to floating point and computed using floating-point kernels.在推理时，权重从 8 位精度转换为浮点，并使用浮点内核计算。 This conversion is done once and cached to reduce latency.此转换完成一次并缓存以减少延迟。

and also in dynamic range quantization, the activations are always stored in float 32, however, they are converted to 8-bit integers while processing and back to floating point after the processing is done.并且在动态范围量化中，激活始终存储在浮点数 32 中，但是，它们在处理时转换为 8 位整数，并在处理完成后返回浮点数。

I am confused that if weights are converted to float32 at inference time, then how is quantization done?我很困惑，如果在推理时将权重转换为 float32，那么量化是如何完成的？

1 个解决方案

Quote fromhttps://www.tensorflow.org/lite/performance/post_training_quant引自https://www.tensorflow.org/lite/performance/post_training_quant

In addition, TFLite supports on the fly quantization and dequantization of activations to allow for:此外，TFLite 支持激活的动态量化和反量化，以允许：

Using quantized kernels for faster implementation when available.在可用时使用量化内核以加快实现速度。 Mixing of floating-point kernels with quantized kernels for different parts of the graph.图不同部分的浮点内核与量化内核的混合。

If the kernel has an optimized path that supports quantization, the float activation is quantized to be applied with the quantized weights.如果内核具有支持量化的优化路径，则浮动激活被量化以应用量化权重。

Otherwise, activation is kept in float and weights will be converted to float for inference.否则，激活将保持为浮点数，权重将转换为浮点数以进行推理。