简体   繁体   English

如何通过实验量化从 TensorFlow 的量化感知训练中获得量化权重

[英]How to get quantized weights from TensorFlow's quantization aware training with experimental quantization

I'm using TensorFlow's quantization aware training API and wish to deploy a model with arbitrary bit-width.我正在使用 TensorFlow 的量化感知训练 API 并希望部署具有任意位宽的 model。 As only 8 bit quantization is supported for tflite deployment I will deploy with a custom inference algorithm, but I still need to access the weights of the model in the correct size.由于 tflite 部署仅支持 8 位量化,我将使用自定义推理算法进行部署,但我仍然需要以正确的大小访问 model 的权重。

Currently after using quantization aware training my model is still in floating point, and as far as I've seen the only way to access the quantized weights is to convert the model to tflite format.目前,在使用量化感知训练后,我的 model 仍处于浮点状态,据我所知,访问量化权重的唯一方法是将 model 转换为 tflite 格式。 However, this is impossible when using experimental functions.但是,在使用实验功能时这是不可能的。

Here is my quantize config class:这是我的量化配置 class:

    class Quantizer(tfmot.quantization.keras.QuantizeConfig):
    # Configure how to quantize weights.
    def get_weights_and_quantizers(self, layer):
        return [(layer.kernel, tfmot.quantization.keras.quantizers.LastValueQuantizer(num_bits=8, symmetric=True, narrow_range=False, per_axis=False))]

    # Configure how to quantize activations.
    def get_activations_and_quantizers(self, layer):
        return [(layer.activation, tfmot.quantization.keras.quantizers.MovingAverageQuantizer(num_bits=8, symmetric=False, narrow_range=False, per_axis=False))]

    def set_quantize_weights(self, layer, quantize_weights):
        # Add this line for each item returned in `get_weights_and_quantizers`
        # , in the same order
            layer.kernel = quantize_weights[0]

    def set_quantize_activations(self, layer, quantize_activations):
        # Add this line for each item returned in `get_activations_and_quantizers`
        # , in the same order.
        layer.activation = quantize_activations[0]

    # Configure how to quantize outputs (may be equivalent to activations).
    def get_output_quantizers(self, layer):
        return []

    def get_config(self):
        return {}
    
class ModifiedQuantizer(Quantizer):
    # Configure weights to quantize with 4-bit instead of 8-bits.
    def get_weights_and_quantizers(self, layer):
        return [(layer.kernel, quantizer(num_bits=bits, symmetric=symmetric, narrow_range=narrow_range, per_axis=per_axis))]

And here is how I quantize the model:以下是我如何量化 model:

    supported_layers = [
    tf.keras.layers.Conv2D,
    tf.keras.layers.DepthwiseConv2D
]

class Quantizer(tfmot.quantization.keras.QuantizeConfig):
    # Configure how to quantize weights.
    def get_weights_and_quantizers(self, layer):
        return [(layer.kernel, tfmot.quantization.keras.quantizers.LastValueQuantizer(num_bits=8, symmetric=True, narrow_range=False, per_axis=False))]

    # Configure how to quantize activations.
    def get_activations_and_quantizers(self, layer):
        return [(layer.activation, tfmot.quantization.keras.quantizers.MovingAverageQuantizer(num_bits=8, symmetric=False, narrow_range=False, per_axis=False))]

    def set_quantize_weights(self, layer, quantize_weights):
        # Add this line for each item returned in `get_weights_and_quantizers`
        # , in the same order
            layer.kernel = quantize_weights[0]

    def set_quantize_activations(self, layer, quantize_activations):
        # Add this line for each item returned in `get_activations_and_quantizers`
        # , in the same order.
        layer.activation = quantize_activations[0]

    # Configure how to quantize outputs (may be equivalent to activations).
    def get_output_quantizers(self, layer):
        return []

    def get_config(self):
        return {}
    
class ModifiedQuantizer(Quantizer):
    # Configure weights to quantize with 4-bit instead of 8-bits.
    def get_weights_and_quantizers(self, layer):
        return [(layer.kernel, quantizer(num_bits=bits, symmetric=symmetric, narrow_range=narrow_range, per_axis=per_axis))]
    
    # Configure how to quantize activations.
    def get_activations_and_quantizers(self, layer):
        return [(layer.activation, tfmot.quantization.keras.quantizers.MovingAverageQuantizer(num_bits=bits, symmetric=False, narrow_range=False, per_axis=False))]

    def quantize_all_layers(layer):
        for supported_layer in supported_layers:
            if isinstance(layer, supported_layer):
                return quantize_annotate_layer(layer, quantize_config=ModifiedQuantizer())
        # print(layer.name)
        return layer
    annotated_model = clone_model(
        model,
        clone_function=quantize_all_layers
    )

with quantize_scope(
    {'Quantizer': Quantizer},
    {'ModifiedQuantizer': ModifiedQuantizer},
    {'_relu6': models._relu6}):
    q_aware_model = quantize_apply(annotated_model)

optimizer = keras.optimizers.Adam(
    learning_rate=0.001)
q_aware_model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(
    from_logits=True),
    optimizer=optimizer, metrics=['sparse_categorical_accuracy'])

train_images, train_labels, val_images, val_labels, _, _ = cifar10.load()

q_aware_model.fit(train_images, train_labels, batch_size=64, epochs=1, verbose=1,
                  validation_data=(val_images, val_labels))

Is previously said, when using eg bits=4 in the ModifiedQuantizer, the model is still saved in floating point, and I don't know how to access the quantized weights.前面说过,在 ModifiedQuantizer 中使用 eg bits=4 时,model 仍然保存在浮点数中,我不知道如何访问量化权重。

Thanks!谢谢!

I suspect you could get the quantized weights by invoking LastValueQuantizer.__call__ on a given layer's weight tensor.我怀疑您可以通过在给定层的权重张量上调用LastValueQuantizer.__call__来获得量化权重。 How to invoke that method is the question.如何调用该方法是个问题。

The current signature is:目前的签名是:

    LastValueQuantizer.__call__(inputs, training, weights, **kwargs)

I assume that inputs is the layer's weights and weights is the value returned by LastValueQuantizer.build .我假设inputs是层的权重, weightsLastValueQuantizer.build返回的值。 If you could get a reference to the weights returned by build , I would hope it would be straightforward to quantize the layer's weights directly using LastValueQuantizer.__call__ .如果您可以获得对build返回的weights的引用,我希望直接使用LastValueQuantizer.__call__量化层的权重会很简单。

[nav] In [1]: from tensorflow_model_optimization.quantization.keras.quantizers import LastValueQuantizer
INFO:tensorflow:Enabling eager execution
INFO:tensorflow:Enabling v2 tensorshape
INFO:tensorflow:Enabling resource variables
INFO:tensorflow:Enabling tensor equality
INFO:tensorflow:Enabling control flow v2

[nav] In [2]: q = LastValueQuantizer(num_bits=3, per_axis=True, symmetric=True, narrow_range=True)

[ins] In [3]: ??q.__call__
Signature: q.__call__(inputs, training, weights, **kwargs)
Source:   
  def __call__(self, inputs, training, weights, **kwargs):
    """Quantize tensor.

    Args:
      inputs: Input tensor to be quantized.
      training: Whether the graph is currently training.
      weights: Dictionary of weights the quantizer can use to quantize the
        tensor. This contains the weights created in the `build` function.
      **kwargs: Additional variables which may be passed to the quantizer.

    Returns:
      Quantized tensor.
    """
    return quant_ops.LastValueQuantize(
        inputs,
        weights['min_var'],
        weights['max_var'],
        is_training=training,
        num_bits=self.num_bits,
        per_channel=self.per_axis,
        symmetric=self.symmetric,
        narrow_range=self.narrow_range
    )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Tensorflow 量化感知训练 - Tensorflow Quantization Aware Training Tensorflow Keras 模型的量化感知训练 - Quantization Aware Training for Tensorflow Keras model TensorFlow 版本 2 和 BatchNorm 折叠中的量化感知训练 - Quantization aware training in TensorFlow version 2 and BatchNorm folding tensorflow 2.2.0 中的量化感知训练产生更长的推理时间 - Quantization aware training in tensorflow 2.2.0 producing higher inference time 使用高级keras api在Tensorflow中进行量化感知训练 - Quantization-aware training in Tensorflow using the highlevel keras api 用于量化感知训练的 TF Lite 的 Toco 转换器参数说明 - Description of TF Lite's Toco converter args for quantization aware training 如何使用量化感知训练完成神经网络的4位量化 - How to complete 4-bit quantization of neural network using quantization-aware-training 训练后量化后的“模型未量化”取决于 model 结构? - “Model not quantized” after post-training quantization depends on model structure? TensorFlow QAT 如何获得量化权重 - TensorFlow QAT how to get the quantized weights 如何解决运行时错误:在进行训练后量化(来自saved_model 的完全量化的tflite 模型)时张量Cast 的空最小值/最大值? - How to solve Runtime Error: Empty min/max for tensor Cast while doing post-training quantization (fully quantized tflite model from saved_model)?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM