使用 Keras 和 TF2 加载保存的自定义 BERT model 后如何访问 tokenzier？

Question

I am working on Intent classification problem and need your help.我正在研究意图分类问题，需要您的帮助。

I fine-tuned one of the BERT model for text classification.我微调了其中一个 BERT model 用于文本分类。 Trained and evaluated it on a small dataset for detecting five intents.在一个小数据集上对其进行训练和评估，以检测五个意图。 I used the following code Intent Recognition with BERT using Keras and TensorFlow 2 It is working fine!我使用 Keras 和 TensorFlow 的 BERT 使用以下代码 Intent Recognition 2它工作正常！ I have saved the model, so that I can use later on without retraining the model again in future.我已经保存了 model，这样我以后可以不用再重新训练 model 就可以使用了。

# Save the entire model as a SavedModel.
!mkdir -p saved_model
model.save('saved_model/intentclassifiermodel')

And zipped it and downloaded it to use it separately并将其压缩并下载以单独使用

!zip -r saved_model.zip saved_model/

Now, I am trying to use this model to predict the intent recognition.现在，我正在尝试使用这个 model 来预测意图识别。 For that I created another google colab notebook and loaded the model为此，我创建了另一个谷歌 colab 笔记本并加载了 model

from google.colab import drive
drive.mount('/content/gdrive')

!pip install tensorflow==2.2

!pip install bert-for-tf2 >> /dev/null

import bert

from tensorflow import keras
model = keras.models.load_model('/content/gdrive/MyDrive/NLPMODELS/saved_model/intentclassifiermodel')

model.summary()

The model is loaded successfully, now I want to predict. model 加载成功，现在我要预测一下。 For that I am using following code snippet (it was the same code in base code)为此，我正在使用以下代码片段（它与基本代码中的代码相同）

sentences = [
  
  "are you a bot?",
  "how to create a bot"
]

pred_tokens = map(tokenizer.tokenize, sentences)
pred_tokens = map(lambda tok: ["[CLS]"] + tok + ["[SEP]"], pred_tokens)
pred_token_ids = list(map(tokenizer.convert_tokens_to_ids, pred_tokens))

pred_token_ids = map(lambda tids: tids +[0]*(data.max_seq_len-len(tids)),pred_token_ids)
pred_token_ids = np.array(list(pred_token_ids))

predictions = model.predict(pred_token_ids).argmax(axis=-1)

for text, label in zip(sentences, predictions):
  print("text:", text, "\nintent:", classes[label])
  print()

**However, this code fails because I am not sure how to access the tokenizer here. **但是，此代码失败，因为我不确定如何在此处访问标记器。 ** **

Here is the error这是错误

Can you please help me how to get the tokenizer?你能帮我如何获得标记器吗？

Thanks and Regards, Rohit Dhamija谢谢和问候， Rohit Dhamija

Answer 1

Thank-you @AloneTogether for pointing out to SO谢谢@AloneTogether 指出SO

So, besides saving the model assest folder, I also saved the tokenizer.因此，除了保存 model 资产文件夹外，我还保存了标记器。

In order to make the code work, I required two additional things为了使代码正常工作，我需要另外两件事

data.max_seq_len and class values. data.max_seq_len 和 class 值。

For now, I extracted them while saving the model and used it in my program.现在，我在保存 model 的同时提取它们，并在我的程序中使用它。

Thanks!谢谢！

使用 Keras 和 TF2 加载保存的自定义 BERT model 后如何访问 tokenzier？

问题描述

1 个解决方案

解决方案1
0 2021-12-02 17:33:10

使用 Keras 和 TF2 加载保存的自定义 BERT model 后如何访问 tokenzier？

问题描述

1 个解决方案

解决方案1 0 2021-12-02 17:33:10

解决方案1
0 2021-12-02 17:33:10