简体   繁体   English

使用 Keras 和 TF2 加载保存的自定义 BERT model 后如何访问 tokenzier?

[英]How to get access to tokenzier after loading a saved custom BERT model using Keras and TF2?

I am working on Intent classification problem and need your help.我正在研究意图分类问题,需要您的帮助。

I fine-tuned one of the BERT model for text classification.我微调了其中一个 BERT model 用于文本分类。 Trained and evaluated it on a small dataset for detecting five intents.在一个小数据集上对其进行训练和评估,以检测五个意图。 I used the following code Intent Recognition with BERT using Keras and TensorFlow 2 It is working fine!使用 Keras 和 TensorFlow 的 BERT 使用以下代码 Intent Recognition 2它工作正常! 在此处输入图像描述 I have saved the model, so that I can use later on without retraining the model again in future.我已经保存了 model,这样我以后可以不用再重新训练 model 就可以使用了。

# Save the entire model as a SavedModel.
!mkdir -p saved_model
model.save('saved_model/intentclassifiermodel')

And zipped it and downloaded it to use it separately并将其压缩并下载以单独使用

!zip -r saved_model.zip saved_model/

Now, I am trying to use this model to predict the intent recognition.现在,我正在尝试使用这个 model 来预测意图识别。 For that I created another google colab notebook and loaded the model为此,我创建了另一个谷歌 colab 笔记本并加载了 model

from google.colab import drive
drive.mount('/content/gdrive')

!pip install tensorflow==2.2

!pip install bert-for-tf2 >> /dev/null

import bert

from tensorflow import keras
model = keras.models.load_model('/content/gdrive/MyDrive/NLPMODELS/saved_model/intentclassifiermodel')

model.summary()

在此处输入图像描述

The model is loaded successfully, now I want to predict. model 加载成功,现在我要预测一下。 For that I am using following code snippet (it was the same code in base code)为此,我正在使用以下代码片段(它与基本代码中的代码相同)

sentences = [
  
  "are you a bot?",
  "how to create a bot"
]

pred_tokens = map(tokenizer.tokenize, sentences)
pred_tokens = map(lambda tok: ["[CLS]"] + tok + ["[SEP]"], pred_tokens)
pred_token_ids = list(map(tokenizer.convert_tokens_to_ids, pred_tokens))

pred_token_ids = map(lambda tids: tids +[0]*(data.max_seq_len-len(tids)),pred_token_ids)
pred_token_ids = np.array(list(pred_token_ids))

predictions = model.predict(pred_token_ids).argmax(axis=-1)

for text, label in zip(sentences, predictions):
  print("text:", text, "\nintent:", classes[label])
  print()

**However, this code fails because I am not sure how to access the tokenizer here. **但是,此代码失败,因为我不确定如何在此处访问标记器。 ** **

Here is the error这是错误在此处输入图像描述

Can you please help me how to get the tokenizer?你能帮我如何获得标记器吗?

Thanks and Regards, Rohit Dhamija谢谢和问候, Rohit Dhamija

Thank-you @AloneTogether for pointing out to SO谢谢@AloneTogether 指出SO

So, besides saving the model assest folder, I also saved the tokenizer.因此,除了保存 model 资产文件夹外,我还保存了标记器。

In order to make the code work, I required two additional things为了使代码正常工作,我需要另外两件事

data.max_seq_len and class values. data.max_seq_len 和 class 值。

For now, I extracted them while saving the model and used it in my program.现在,我在保存 model 的同时提取它们,并在我的程序中使用它。

Thanks!谢谢!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM