如何使用微调的 BERT model 进行句子编码？

Question

I fine-tuned the BERT base model on my own dataset following the script here:我按照此处的脚本在我自己的数据集上微调了 BERT 基础 model：

https://github.com/cedrickchee/pytorch-pretrained-BERT/tree/master/examples/lm_finetuning https://github.com/cedrickchee/pytorch-pretrained-BERT/tree/master/examples/lm_finetuning

I saved the model as a .pt file and I want to use it now for a sentence similarity task.我将 model 保存为.pt文件，现在我想将其用于句子相似性任务。 Unfortunately, it is not clear to me, how to load the fine-tuned model.不幸的是，我不清楚如何加载经过微调的 model。 I tried the following:我尝试了以下方法：

model = BertModel.from_pretrained('trained_model.pt')
model.eval()

This doesn't work.这行不通。 It says:它说：

ReadError: not a gzip file

So apparently, loading a .pt file with the from_pretrained method is not possible.显然，使用from_pretrained方法加载.pt文件是不可能的。 Can anyone help me out here?有谁可以帮我离开这里吗？ Thank's a lot:!非常感谢：！ :) :)

Edit: I saved the model in a s3 bucket as follows:编辑：我将 model 保存在 s3 存储桶中，如下所示：

# Convert model to buffer
buffer = io.BytesIO()
torch.save(model, buffer)
# Save in s3 bucket
output_model_file = output_folder + "trained_model.pt"
s3_.put_object(Bucket="power-plant-embeddings", Key=output_model_file, Body=buffer.getvalue())

Answer 1

To load a model with BertModel.from_pretrained() you need to have saved it using save_pretrained() (link) .要使用BertModel.from_pretrained()加载 model，您需要使用save_pretrained() （链接）保存它。

Any other storage method would require the corresponding load.任何其他存储方法都需要相应的负载。 I am not familiar with S3, but I assume you can use get_object (link) to retrieve the model, and then save it using the huggingface api.我不熟悉 S3，但我假设您可以使用get_object (link)来检索 model，然后使用 huggingface api 保存它。 From then on, you should be able to use from_pretrained() normally.从那时起，您应该可以正常使用from_pretrained()了。

如何使用微调的 BERT model 进行句子编码？

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-03-19 12:53:09

如何使用微调的 BERT model 进行句子编码？

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-03-19 12:53:09

解决方案1
1 已采纳 2021-03-19 12:53:09