简体   繁体   中英

Operation type in full integer quantization method in TensorFlowLite

I want to apply Post-Training Quantization (Full integer) using TensorFlow model optimization package on a pre-trained model (LeNet5). https://www.tensorflow.org/model_optimization/guide/quantization/post_training

model = Sequential()
model._name = 'LeNet5'
model.add(tf.keras.layers.InputLayer(input_shape=(28, 28)))
model.add(tf.keras.layers.Reshape(target_shape=(28, 28, 1)))
model.add(
    Conv2D(6, kernel_size=(5, 5), strides=(1, 1), activation='tanh', padding='same'))

model.add(AveragePooling2D(pool_size=(2, 2), strides=(2, 2), padding='valid'))

model.add(Conv2D(16, kernel_size=(5, 5), strides=(1, 1), activation='tanh', padding='valid'))

model.add(AveragePooling2D(pool_size=(2, 2), strides=(2, 2), padding='valid'))

model.add(Flatten())

model.add(Dense(120, activation='tanh'))

model.add(Dense(84, activation='tanh'))

model.add(Dense(10, activation='softmax'))

and using this code I have applied Full Integer Post-Training Quantization:

     mnist_train, _ = tf.keras.datasets.mnist.load_data()
     images = tf.cast(mnist_train[0], tf.float32) / 255.0
     mnist_ds = tf.data.Dataset.from_tensor_slices((images)).batch(1)
     def representative_data_gen():
       for input_value in mnist_ds.take(100):
         yield [input_value]

    converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]
    converter.representative_dataset = representative_data_gen

    converter.allow_custom_ops = True
    converter.target_spec.supported_types = [tf.int8]
    converter.inference_input_type = tf.int8
    converter.inference_output_type = tf.int8

    full_integer_quantization_model = converter.convert()
    open("tflite_model.tflite", "wb").write(full_integer_quantization_model)

It works fine in tense of accuracy but when I try to print the data type of each layer (operation, like conv, activation, bias), I see that some of operations are in int32 instead of int8.

I don't know why?

How does TFLite decide to do some ops in int32 and some in int8?

Is it possible to control this feature (is it an option) in TFLite and perform all operations as int8?

Have you taken a look at this https://www.tensorflow.org/lite/performance/quantization_spec ?

Bias values have 32-bits width

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM