简体   繁体   English

文本Blob朴素贝叶斯分类

[英]Text Blob Naive Bayes classification

I am using textblob lib for classification using naive bayes , I have a train set and wants to check if I pass a word it should check in the train and classify accordingly and if the word is not present in the train it should not suggest any classification. 我正在使用textblob lib进行朴素贝叶斯分类,我有一个火车,想检查我是否通过了一个单词,它应该检查火车并进行相应的分类,如果这个词不在火车中,则不应建议任何分类。

example : kartik is not in the train set , however it is classifying it as '1', and same for any other words which are not present in the training set. 例如:kartik不在训练集中,但是将其分类为“ 1”,并且对于训练集中没有出现的任何其他单词也是如此。

is there any way if I suggest some word which is not in train it should not give '1'. 有什么办法可以建议我不要使用一些不适合训练的单词吗?

from textblob import TextBlob
from textblob.classifiers import NaiveBayesClassifier


train = [
 ('System is working fine', '1'),
 ('Issue Resolved ', '1'),
 ('Working Fine ', '1'),
 ('running smoothly', '1'),
 ("server is working fine ", '1'),
 ('software installed properly', '1'),
 ('Ticket resolved ', '1'),
 ("Laptop is not working ", '-1'),
 ('laptop issue', '-1'),
 ('upgrade laptop', '-1'),
 ('software not working','-1'),
 ('fix the issue','-1'),
 ('WIFI is not working','-1'),
 ('server is down','-1'),
 ('system is not working','-1')


]

c1 = NaiveBayesClassifier(train)
c1.classify("kartik")

You can try using getting the probability of classification and then set a threshold, ignoring the class labels below the given. 您可以尝试使用获取分类的概率,然后设置阈值,而忽略给定下方的类别标签。

prob_dist = cl.prob_classify("Lorem Ispum dolor sit amet")
cl.classify("Lorem Ipsum Dolor sit amet")
print(round(prob_dist.prob("1"), 2))
print(round(prob_dist.prob("-1"),2))

0.61 0.61

0.39 0.39

I observed that all non-existing words are giving a prob of 0.61 for class 1. You can use this as a starting point. 我观察到,所有不存在的单词的类1的概率为0.61。您可以以此为起点。

However, test for all correct cases properly. 但是,请正确测试所有正确的情况。 Setting a threshold may have adverse effects on some correct classifications. 设置阈值可能会对某些正确的分类产生不利影响。

In any case, increase size of your train data and you'll see better results which can help you set a threshold 无论如何,增加火车数据的大小,您会看到更好的结果,可以帮助您设置阈值

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM