I want to apply Post-Training Quantization (Full integer) using TensorFlow model optimization package on a pre-trained model (LeNet5). https://www.tensorflow.org/model_optimization/guide/quantization/post_training
model = Sequential()
model._name = 'LeNet5'
model.add(tf.keras.layers.InputLayer(input_shape=(28, 28)))
model.add(tf.keras.layers.Reshape(target_shape=(28, 28, 1)))
model.add(
Conv2D(6, kernel_size=(5, 5), strides=(1, 1), activation='tanh', padding='same'))
model.add(AveragePooling2D(pool_size=(2, 2), strides=(2, 2), padding='valid'))
model.add(Conv2D(16, kernel_size=(5, 5), strides=(1, 1), activation='tanh', padding='valid'))
model.add(AveragePooling2D(pool_size=(2, 2), strides=(2, 2), padding='valid'))
model.add(Flatten())
model.add(Dense(120, activation='tanh'))
model.add(Dense(84, activation='tanh'))
model.add(Dense(10, activation='softmax'))
and using this code I have applied Full Integer Post-Training Quantization:
mnist_train, _ = tf.keras.datasets.mnist.load_data()
images = tf.cast(mnist_train[0], tf.float32) / 255.0
mnist_ds = tf.data.Dataset.from_tensor_slices((images)).batch(1)
def representative_data_gen():
for input_value in mnist_ds.take(100):
yield [input_value]
converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]
converter.representative_dataset = representative_data_gen
converter.allow_custom_ops = True
converter.target_spec.supported_types = [tf.int8]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8
full_integer_quantization_model = converter.convert()
open("tflite_model.tflite", "wb").write(full_integer_quantization_model)
It works fine in tense of accuracy but when I try to print the data type of each layer (operation, like conv, activation, bias), I see that some of operations are in int32 instead of int8.
I don't know why?
How does TFLite decide to do some ops in int32 and some in int8?
Is it possible to control this feature (is it an option) in TFLite and perform all operations as int8?
Have you taken a look at this https://www.tensorflow.org/lite/performance/quantization_spec ?
Bias values have 32-bits width
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.