简体   繁体   English

如何获得多类分类问题中每个类的精度分数?

[英]How to get the precision score of every class in a Multi class Classification Problem?

I am making Sentiment Analysis Classification and I am doing it with Scikit-learn.我正在做情绪分析分类,我正在用 Scikit-learn 做它。 This has 3 labels, positive, neutral and negative.这有 3 个标签,正面、中性和负面。 The Shape of my training data is (14640, 15) , where我的训练数据的形状是(14640, 15) ,其中

negative    9178
neutral     3099
positive    2363

I have pre-processed the data and applied the bag-of-words word vectorization technique to the text of twitter as there many other attributes too, whose size is then (14640, 1000) .我已经对数据进行了预处理,并将bag-of-words词向量化技术应用于 twitter 的文本,因为还有许多其他属性,其大小为(14640, 1000) As the Y, means the label is in the text form so, I applied LabelEncoder to it.由于 Y 表示标签采用文本形式,因此我对其应用了 LabelEncoder。 This is how I split my dataset -这就是我拆分数据集的方式 -

X_train, X_test, Y_train, Y_test = train_test_split(bow, Y, test_size=0.3, random_state=42)
print(X_train.shape,Y_train.shape)
print(X_test.shape,Y_test.shape)

out: (10248, 1000) (10248,)
     (4392, 1000) (4392,)

And this is my classifier这是我的分类器

svc = svm.SVC(kernel='linear', C=1, probability=True).fit(X_train, Y_train) 
prediction = svc.predict_proba(X_test) 
prediction_int = prediction[:,1] >= 0.3 
prediction_int = prediction_int.astype(np.int) 
print('Precision score: ', precision_score(Y_test, prediction_int, average=None))
print('Accuracy Score: ', accuracy_score(Y_test, prediction_int))

out:Precision score:  [0.73980398 0.48169243 0.        ]
Accuracy Score:  0.6675774134790529
/usr/local/lib/python3.6/dist-packages/sklearn/metrics/classification.py:1437: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples.
  'precision', 'predicted', average, warn_for)

Now I am not sure why the third one, in precision score is blank?现在我不知道为什么第三个,精度分数是空白的? I have applied average=None , because to make a separate precision score for every class.我已经应用了average=None ,因为为每个班级制作单独的精度分数。 Also, I am not sure about the prediction, if it is right or not, because I wrote it for binary classification?另外,我不确定预测是否正确,因为我是为二进制分类编写的? Can you please help me to debug it to make it better.你能帮我调试一下以使其更好吗? Thanks in advance.提前致谢。

As the warning explains:正如警告所解释的那样:

UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples.

it seems that one of your 3 classes is missing from your predictions prediction_int (ie you never predict it);您的预测中似乎缺少您的 3 个类之一prediction_int (即您从未预测过它); you can easily check if this is the case with您可以轻松检查是否是这种情况

set(Y_test) - set(prediction_int)

which should be the empty set {} if this is not the case.如果不是这种情况,它应该是空集{}

If this is indeed the case, and the above operation gives {1} or {2} , the most probable reason is that your dataset is imbalanced (you have much more negative samples), and you do not ask for a stratified split;如果确实如此,并且上述操作给出{1}{2} ,最可能的原因是您的数据集不平衡(您有更多的negative样本),并且您没有要求分层拆分; modify your train_test_split to将您的train_test_split修改为

X_train, X_test, Y_train, Y_test = train_test_split(bow, Y, test_size=0.3, stratify=Y, random_state=42)

and try again.然后再试一次。

UPDATE (after comments):更新(评论后):

As it turns out, you have a class imbalance problem (and not a coding issue) which prevents your classifier from successfully predicting your 3rd class ( positive ).事实证明,您有一个类不平衡问题(而不是编码问题),这会阻止您的分类器成功预测您的第三类( positive )。 Class imbalance is a huge sub-topic in itself, and there are several remedies proposed.类不平衡本身就是一个巨大的子主题,并且提出了几种补救措施。 Although going into more detail is arguably beyond the scope of a single SO thread, the first thing you should try (on top of the suggestions above) is to use the class_weight='balanced' argument in the definition of your classifier, ie:尽管可以说更详细的内容超出了单个 SO 线程的范围,但您应该尝试的第一件事(在上述建议之上)是在分类器的定义中使用class_weight='balanced'参数,即:

svc = svm.SVC(kernel='linear', C=1, probability=True, class_weight='balanced').fit(X_train, Y_train) 

For more options, have a look at the dedicated imbalanced-learn Python library (part of the scikit-learn-contrib projects).有关更多选项,请查看专用的不平衡学习Python 库( scikit-learn-contrib项目的一部分)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用交叉验证在多类数据集中对精度、召回率和 f1-score 进行评分? - how to score precision, recall and f1-score in a multi-class dataset using cross-validate? Tensorflow 中多类分类的分类精度和召回率? - Class wise precision and recall for multi class classification in Tensorflow? 多类分类的每类 F1 分数 - F1-score per class for multi-class classification 如何获取多标签分类问题的样本权重和class权重? - How to get sample weights and class weights for multi-label classification problem? 如何对以下问题进行多 label 分类或多 class 分类? Pandas Python - How to do Multi label classification or Multi class classification of the below problem? Pandas Python 在“多类别分类”中计算每类平均准确度问题? - Calculating mean per class accuracy in Multi class classification Problem? 计算多 label 分类 keras 的召回精度和 F1 分数 - compute the recall precision and F1 score for a multi label classification keras 在 PyML 中获取多类问题的召回(灵敏度)和精度(PPV)值 - Get recall (sensitivity) and precision (PPV) values of a multi-class problem in PyML 如何解释多类分类的输出? - How to expain the output of multi class classification? 如何获得 tensorflow 中 class 0 的精度? - How to get precision for class 0 in tensorflow?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM