简体   繁体   中英

Converting PyTorch to ONNX model changes file size for ALBert but not BERT

Goal: Use this Notebook to perform quantisation on albert-base-v2 model.

Kernel: conda_pytorch_p36 .


Outputs in Sections 1.2 & 2.2 show that:

  • converting vanilla BERT from PyTorch to ONNX stays the same size , 417.6 MB .
  • Quantization models are smaller than vanilla BERT, PyTorch 173.0 MB and ONNX 104.8 MB .

However, when running ALBert :

  • PyTorch and ONNX model sizes are different .
  • Quantized model sizes are bigger than vanilla.

I think this is the reason for poorer model performance of both Quantization methods of ALBert, compared to vanilla ALBert.

PyTorch:

Size (MB): 44.58906650543213
Size (MB): 22.373255729675293

ONNX:

ONNX full precision model size (MB): 341.64233207702637
ONNX quantized model size (MB): 85.53886985778809

Why might exporting ALBert from PyTorch to ONNX increase model size, but not for BERT?

Please let me know if there's anything else I can add to post.

Explanation

ALBert model has shared weights among layers. torch.onnx.export outputs the weights to different tensors, which causes the model size to grow larger.

A number of Git Issues have been marked Solved regarding this phenomena .

The most common solution is to remove shared weights , that is to remove tensor arrays that contain the exact same values.


Solutions

Section "Removing shared weights" in onnx_remove_shared_weights.ipynb .

Pseudo-code :

from onnxruntime.transformers.onnx_model import OnnxModel
model=onnx.load(path)
onnx_model=OnnxModel(model)
count = len(model.graph.initializer)
same = [-1] * count
for i in range(count - 1):
  if same[i] >= 0:
    continue
  for j in range(i+1, count):
     if has_same_value(model.graph.initializer[i], model.graph.initializer[j]):
    same[j] = i

for i in range(count):
   if same[i] >= 0:
        onnx_model.replace_input_of_all_nodes(model.graph.initializer[i].name, model.graph.initializer[same[i]].name)

onnx_model.update_graph()
onnx_model.save_model_to_file(output_path)

Source of both solutions

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM