Goal: Use this Notebook to perform quantisation on albert-base-v2 model.
Kernel: conda_pytorch_p36
.
Outputs in Sections 1.2 & 2.2 show that:
417.6 MB
.173.0 MB
and ONNX 104.8 MB
.However, when running ALBert :
I think this is the reason for poorer model performance of both Quantization methods of ALBert, compared to vanilla ALBert.
PyTorch:
Size (MB): 44.58906650543213
Size (MB): 22.373255729675293
ONNX:
ONNX full precision model size (MB): 341.64233207702637
ONNX quantized model size (MB): 85.53886985778809
Why might exporting ALBert from PyTorch to ONNX increase model size, but not for BERT?
Please let me know if there's anything else I can add to post.
ALBert model has shared weights among layers. torch.onnx.export
outputs the weights to different tensors, which causes the model size to grow larger.
A number of Git Issues have been marked Solved regarding this phenomena .
The most common solution is to remove shared weights , that is to remove tensor arrays that contain the exact same values.
Section "Removing shared weights" in onnx_remove_shared_weights.ipynb .
from onnxruntime.transformers.onnx_model import OnnxModel
model=onnx.load(path)
onnx_model=OnnxModel(model)
count = len(model.graph.initializer)
same = [-1] * count
for i in range(count - 1):
if same[i] >= 0:
continue
for j in range(i+1, count):
if has_same_value(model.graph.initializer[i], model.graph.initializer[j]):
same[j] = i
for i in range(count):
if same[i] >= 0:
onnx_model.replace_input_of_all_nodes(model.graph.initializer[i].name, model.graph.initializer[same[i]].name)
onnx_model.update_graph()
onnx_model.save_model_to_file(output_path)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.