如何使用訓練好的文本分類模型

Question

我實現了一個 SVM 模型，可以將給定的文本分為兩類。 該模型使用 data.csv 數據集進行訓練和測試。 現在我想將此模型與實時數據一起使用。 為此，我使用了 pickle python 庫。 首先我保存了模型。

joblib.dump(clf, "model.pkl")

然后我加載了那個模型。

classifer = joblib.load("model.pkl")

然后我使用下面的輸入作為要分類的文本。

new_observation = "this news should be in one category"
classifer.predict([new_observation])

但是運行這個之后，它給出了一個錯誤。

ValueError: 無法將字符串轉換為浮點數：“此新聞應屬於一個類別”

我參考了下面的鏈接以了解如何保存和加載經過訓練的模型。 [ https://scikit-learn.org/stable/modules/model_persistence.html][1]

編輯

這是我用來創建 svm 模型的代碼。

data = pd.read_csv('data1.csv',encoding='cp1252')

def pre_process(text):

    text = text.translate(str.maketrans('', '', string.punctuation))

    text = [word for word in text.split() if word.lower() not in 
    stopwords.words('english')]

    words = ""

    for i in text:

            stemmer = SnowballStemmer("english")

            words += (stemmer.stem(i))+" "

    return words

textFeatures = data['textForCategorized'].copy()

textFeatures = textFeatures.apply(pre_process)

vectorizer = TfidfVectorizer("english")

features = vectorizer.fit_transform(textFeatures)

features_train, features_test, labels_train, labels_test = train_test_split(features, data['class'], test_size=0.3, random_state=111)

    svc = SVC(kernel='sigmoid', gamma=1.0)

    clf = svc.fit(features_train, labels_train)

    prediction = svc.predict(features_test)

在實現模型之后，這是我嘗試向模型提供輸入的方式。

joblib.dump(clf, "model.pkl")

classifer = joblib.load("model.pkl")

new_observation = "This news should be in one category"

classifer.predict(new_observation)

編輯

joblib.dump(clf, "model.pkl") 
classifer = joblib.load("model.pkl")
textFeature = "Dengue soaring in ......" 
textFeature =pre_process(textFeature) 
classifer.predict(textFeature.encode())

這是我用來加載模型和向模型輸入文本的代碼。 這樣做之后，我添加了代碼來獲取預測值。 但我有一個錯誤。

ValueError: 無法將字符串轉換為浮點數：b'dengu soar '

Answer 1

您應該在將new_observation提供給模型之前對其進行預處理。 在您的情況下，您只預處理了用於訓練的textFeatures ，您也必須重復new_observation的預處理步驟。

應用pre_process()的函數new_observation
使用vectorizer對從pre_process(new_observation)獲得的輸出進行變換

Answer 2

我遇到了同樣的問題，並通過根據訓練數據的形狀調整單個字符串數據的大小來解決。

完整代碼：

joblib.dump(clf, "model.pkl") 
classifer = joblib.load("model.pkl")
textFeature = "Dengue soaring in ......" 
vocabulary=pre_process(textFeature) 
vocabulary_df =pd.Series(vocabulary)

#### Feature extraction using Tfidf Vectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer(stop_words='english')

test_ = vectorizer.fit_transform(vocabulary_df.values)

test_.resize(1, features_train.shape[1])
classifer.predict(test_)

如何使用訓練好的文本分類模型

問題描述

2 個解決方案

解決方案1
0 2020-02-07 05:54:36

解決方案2
0 2020-09-18 07:04:23

如何使用訓練好的文本分類模型

問題描述

2 個解決方案

解決方案1 0 2020-02-07 05:54:36

解決方案2 0 2020-09-18 07:04:23

解決方案1
0 2020-02-07 05:54:36

解決方案2
0 2020-09-18 07:04:23