如何使用微调的 BERT model 进行句子编码？

Question

我按照此处的脚本在我自己的数据集上微调了 BERT 基础 model：

https://github.com/cedrickchee/pytorch-pretrained-BERT/tree/master/examples/lm_finetuning

我将 model 保存为.pt文件，现在我想将其用于句子相似性任务。 不幸的是，我不清楚如何加载经过微调的 model。 我尝试了以下方法：

model = BertModel.from_pretrained('trained_model.pt')
model.eval()

这行不通。 它说：

ReadError: not a gzip file

显然，使用from_pretrained方法加载.pt文件是不可能的。 有谁可以帮我离开这里吗？ 非常感谢：！ :)

编辑：我将 model 保存在 s3 存储桶中，如下所示：

# Convert model to buffer
buffer = io.BytesIO()
torch.save(model, buffer)
# Save in s3 bucket
output_model_file = output_folder + "trained_model.pt"
s3_.put_object(Bucket="power-plant-embeddings", Key=output_model_file, Body=buffer.getvalue())

Answer 1

要使用BertModel.from_pretrained()加载 model，您需要使用save_pretrained() （链接）保存它。

任何其他存储方法都需要相应的负载。 我不熟悉 S3，但我假设您可以使用get_object (link)来检索 model，然后使用 huggingface api 保存它。 从那时起，您应该可以正常使用from_pretrained()了。

如何使用微调的 BERT model 进行句子编码？

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-03-19 12:53:09

如何使用微调的 BERT model 进行句子编码？

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-03-19 12:53:09

解决方案1
1 已采纳 2021-03-19 12:53:09