如何使用保存的文本分類對新文本數據集進行預測 model

Question

我根據本指南訓練了一個文本分類器： https://developers.google.com/machine-learning/guides/text-classification/step-4

並將 model 保存為

model.save('~./output/model.h5')

在這種情況下，我如何使用這個 model 對另一個新數據集上的文本進行分類？

謝謝

Answer 1

import tensorflow as tf

# Recreate the exact same model, including its weights and the optimizer
new_model = tf.keras.models.load_model('~./output/model.h5')

# Show the model architecture
new_model.summary()

# Apply the same process of data preparation while training the model.
# Lets say after Data preprocessing you have stored the processed data in test_data

# check model accuracy from unseen/new dataset
loss, acc = new_model.evaluate(test_data,  test_labels, verbose=2)
print('Restored model, accuracy: {:5.2f}%'.format(100*acc))

Answer 2

您可以使用 tensorflow 的文本標記化實用程序 class (Tokenizer) 來處理測試數據中的未知單詞。

Num_words 是詞匯量（它選擇最常用的詞）
分配 oov_token = 'Some string'，用於詞匯表大小之外的所有標記/單詞（基本上測試數據中的新單詞將作為 oov_token 字符串處理。
適合訓練數據，然后為訓練和測試數據生成標記序列。
tf.keras.preprocessing.text.Tokenizer( num_words=None, filters=',"#$%&()*+.-:/;?<=>,@[\]^_`{|}~\t \n', lower=True, split=' ', char_level=False, oov_token=None, document_count=0, **kwargs )

如何使用保存的文本分類對新文本數據集進行預測 model

問題描述

2 個解決方案

解決方案1
0 2020-10-11 09:57:24

解決方案2
0 2020-10-22 12:30:08

如何使用保存的文本分類對新文本數據集進行預測 model

問題描述

2 個解決方案

解決方案1 0 2020-10-11 09:57:24

解決方案2 0 2020-10-22 12:30:08

解決方案1
0 2020-10-11 09:57:24

解決方案2
0 2020-10-22 12:30:08