简体   繁体   English

如何使用训练好的文本分类模型

[英]How to use trained text classification model

I implemented an SVM model that can classify given text into two categories.我实现了一个 SVM 模型,可以将给定的文本分为两类。 The model was trained and tested using data.csv data set.该模型使用 data.csv 数据集进行训练和测试。 Now I want to use this model with live data.现在我想将此模型与实时数据一起使用。 To do that I used the pickle python library.为此,我使用了 pickle python 库。 First I saved the model.首先我保存了模型。

joblib.dump(clf, "model.pkl")

Then I have loaded that model.然后我加载了那个模型。

classifer = joblib.load("model.pkl")

Then I used below input as text to be classified.然后我使用下面的输入作为要分类的文本。

new_observation = "this news should be in one category"
classifer.predict([new_observation])

But after running this, it gives an error.但是运行这个之后,它给出了一个错误。

ValueError: could not convert string to float: 'this news should be in one category' ValueError: 无法将字符串转换为浮点数:“此新闻应属于一个类别”

I referred below link to know about how to save and load the trained model.我参考了下面的链接以了解如何保存和加载经过训练的模型。 [ https://scikit-learn.org/stable/modules/model_persistence.html][1] [ https://scikit-learn.org/stable/modules/model_persistence.html][1]

EDIT编辑

Here is the code I used to create an svm model.这是我用来创建 svm 模型的代码。

data = pd.read_csv('data1.csv',encoding='cp1252')

def pre_process(text):

    text = text.translate(str.maketrans('', '', string.punctuation))

    text = [word for word in text.split() if word.lower() not in 
    stopwords.words('english')]

    words = ""

    for i in text:

            stemmer = SnowballStemmer("english")

            words += (stemmer.stem(i))+" "

    return words

textFeatures = data['textForCategorized'].copy()

textFeatures = textFeatures.apply(pre_process)

vectorizer = TfidfVectorizer("english")

features = vectorizer.fit_transform(textFeatures)

features_train, features_test, labels_train, labels_test = train_test_split(features, data['class'], test_size=0.3, random_state=111)

    svc = SVC(kernel='sigmoid', gamma=1.0)

    clf = svc.fit(features_train, labels_train)

    prediction = svc.predict(features_test)

And after implementing the model, here is the way I try to give input to the model.在实现模型之后,这是我尝试向模型提供输入的方式。

joblib.dump(clf, "model.pkl")

classifer = joblib.load("model.pkl")

new_observation = "This news should be in one category"

classifer.predict(new_observation)

EDIT编辑

joblib.dump(clf, "model.pkl") 
classifer = joblib.load("model.pkl")
textFeature = "Dengue soaring in ......" 
textFeature =pre_process(textFeature) 
classifer.predict(textFeature.encode())

Here is the code that I used to load the model and input text to the model.这是我用来加载模型和向模型输入文本的代码。 After doing so, I added code to get prediction value.这样做之后,我添加了代码来获取预测值。 But I got an error.但我有一个错误。

ValueError: could not convert string to float: b'dengu soar ' ValueError: 无法将字符串转换为浮点数:b'dengu soar '

You should pre-process new_observation before feeding it to the model.您应该在将new_observation提供给模型之前对其进行预处理。 In your case, you've only pre-processed textFeatures for training, you must repeat the pre-processing steps for new_observation too.在您的情况下,您只预处理了用于训练的textFeatures ,您也必须重复new_observation的预处理步骤。

  1. Apply the pre_process() function on new_observation应用pre_process()的函数new_observation
  2. Use vectorizer to transform the output obtained from pre_process(new_observation)使用vectorizer对从pre_process(new_observation)获得的输出进行变换

I have got the same issue and resolved by resizing single string data as per the shape of training data.我遇到了同样的问题,并通过根据训练数据的形状调整单个字符串数据的大小来解决。

complete code:完整代码:

joblib.dump(clf, "model.pkl") 
classifer = joblib.load("model.pkl")
textFeature = "Dengue soaring in ......" 
vocabulary=pre_process(textFeature) 
vocabulary_df =pd.Series(vocabulary)

#### Feature extraction using Tfidf Vectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer(stop_words='english')

test_ = vectorizer.fit_transform(vocabulary_df.values)

test_.resize(1, features_train.shape[1])
classifer.predict(test_)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用预训练模型进行文本分类?比较经过微调的 model 与未经微调的预训练 model - How to use pre-trained models for text classification?Comparing a fine-tuned model with a pre-trained model without fine-tuning 使用在Matlab中训练的SVM模型在python中进行分类 - Use SVM model trained in Matlab for classification in python 如何使用预训练模型对新数据进行分类 - Python 文本分类(NLTK 和 Scikit) - How to classify new data using a pre-trained model - Python Text Classification (NLTK and Scikit) 如何从经过训练的多标签文本分类 model 中预测看不见的数据? - How to predict unseen data from trained multi-label text classification model? 使用预训练的 BERT 模型进行错误多类文本分类 - Error multiclass text classification with pre-trained BERT model 经过训练的模型上的Tensorflow MNIST分类 - Tensorflow MNIST classification on a trained model 如何使用经过训练的 XGB 分类模型预测新数据行? - How to predict on new data row using trained XGB classification model? 如何使用经过训练的神经网络模型? - How to use a trained neural network model? Tensorflow:如何在应用程序中使用经过训练的模型? - Tensorflow: How to use a trained model in a application? 如何使用经过训练的 BERT 模型检查点进行预测? - How to use trained BERT model checkpoints for prediction?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM