[英]Why is a TFLite model derived from a quantization aware trained model different different than from a normal model with same weights?
I am training a Keras model that I want to deploy with TFLite in a quantized, 8-bit environment (microcontroller).我正在训练一个 Keras model,我想在量化的 8 位环境(微控制器)中使用 TFLite 部署它。 To improve quantization performance, I perform quantization aware training.为了提高量化性能,我进行了量化感知训练。 I then create the quantized TFLite model using my validation set as a representative dataset.然后,我使用我的验证集作为代表性数据集创建量化的 TFLite model。 Performance is evaluated using the validation set and illustrated in this image:使用验证集评估性能并在此图像中说明:
Error rate for various batches of 20 runs in different conditions不同条件下各批次 20 次运行的错误率
If instead of simply generating the TFLite model (cyan in the figure) from the QA-trained model (red in the figure) I copy the weights from the QA-trained model to the original one and then generate the TFLite model to work around an issue (purple in the figure), this gives slightly different predictions. If instead of simply generating the TFLite model (cyan in the figure) from the QA-trained model (red in the figure) I copy the weights from the QA-trained model to the original one and then generate the TFLite model to work around an问题(图中的紫色),这给出了稍微不同的预测。 Why is that?这是为什么?
I understand that the TFLite models would be slightly different than the QA-trained model, since the transformation uses a post-training quantization based on the validation set.我知道 TFLite 模型与经过 QA 训练的 model 略有不同,因为转换使用基于验证集的训练后量化。 But shouldn't the quantization be the same if the structure, weights and biases of the network are the same?但是如果网络的结构、权重和偏差相同,量化不应该相同吗?
Sub-question: why is the TFLite model on average slightly worse than the normal Keras model?子问题:为什么 TFLite model 平均比普通 Keras model 略差? Since I am quantizing and evaluating on the validation set, if anything I would expect it to perform artificially better.由于我正在对验证集进行量化和评估,如果有的话,我希望它能够人为地表现得更好。
It sounds like you are combining post-training quantization and quantization aware training.听起来您正在结合训练后量化和量化意识训练。 If I understand you correctly you are training a quantized model and then copying the float weights only to the original float model and then running post-training quantization.如果我理解正确,您正在训练量化的 model,然后仅将浮点权重复制到原始浮点 model,然后运行训练后量化。
This procedure is a bit strange -- the issue is that the quantized version of the model also quantizes activations, so that just copying the weights does not result in the same exact network.这个过程有点奇怪——问题是 model 的量化版本也量化了激活,因此仅复制权重不会产生完全相同的网络。 The quantization parameters for activations used by the quantized TF model may end up different from those calculated from the representative dataset, and will lead to different answers.量化 TF model 使用的激活量化参数可能最终与从代表性数据集计算的参数不同,并会导致不同的答案。
I think you could expect that the QAT model works better than the resulting TFLite model because of the trained quantization parameters for activations.我认为您可以期望 QAT model 比生成的 TFLite model 效果更好,因为经过训练的激活量化参数。
I suggest resolving your earlier question, which would lead to a better solution and higher accuracy.我建议解决您之前的问题,这将导致更好的解决方案和更高的准确性。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.