[英]How to calculate the overall accuracy of custom trained spacy ner model with confusion matrix?
I'm trying to evaluate my custom trained Spacy NER model.我正在尝试评估我的自定义训练的 Spacy NER 模型。 How to find the overall accuracy with confusion matrix for the model.如何使用模型的混淆矩阵找到整体精度。
I tried evaluating the model with spacy scorer which gives precision, recall and token accuracy with the below reference,我尝试使用 spacy scorer 评估模型,它通过以下参考提供精确度、召回率和标记准确度,
Evaluation in a Spacy NER model Spacy NER 模型中的评估
I expect the output in confusion matrix instead of individual precision, recall and token accuracy.我期望在混淆矩阵中输出,而不是单个精度、召回率和标记精度。
Here is a good read for creating Confusion Matrices for Spacy NER models. 这是为 Spacy NER 模型创建混淆矩阵的好读物。 It is based on the BILOU format used by Spacy.它基于 Spacy 使用的 BILOU 格式。 It is good for small portions of text but when bigger documents are evaluated a Confusion Matrix is hard to read because most pieces of the text are O-labeled.它适用于小部分文本,但当评估较大的文档时,混淆矩阵很难阅读,因为大部分文本都是 O 标记的。
What you can do is create two lists, one with predicted values per word and one with the true values per word and compare those using the sklearn.metrics.confusion_matrix() function.您可以做的是创建两个列表,一个是每个单词的预测值,一个是每个单词的真实值,然后使用 sklearn.metrics.confusion_matrix() 函数比较它们。
from sklearn.metrics import confusion_matrix
y_true = [O,O,O,B-PER,I-PER]
y_pred = [O,O,O,B-PER,O]
confusion_matrix(y_true, y_pred, labels=["O", "B-PER", "I-PER"])
You can also use the plot_confusion_matrix() function from the same library to get a visual output, however this requires scikit-learn 0.23.1 or above and is only usable with scikit-learn classifiers.您还可以使用同一个库中的 plot_confusion_matrix() 函数来获得视觉输出,但这需要 scikit-learn 0.23.1 或更高版本,并且只能与 scikit-learn 分类器一起使用。
As written in this stackoverflow question, this is a way to use the confusion_matrix() from scikit-learn without their plot.正如在这个stackoverflow 问题中所写的那样,这是一种使用 scikit-learn 中的混淆矩阵() 的方法,而没有它们的情节。
from sklearn.metrics import confusion_matrix
labels = ['business', 'health']
cm = confusion_matrix(y_test, pred, labels)
print(cm)
fig = plt.figure()
ax = fig.add_subplot(111)
cax = ax.matshow(cm)
plt.title('Confusion matrix of the classifier')
fig.colorbar(cax)
ax.set_xticklabels([''] + labels)
ax.set_yticklabels([''] + labels)
plt.xlabel('Predicted')
plt.ylabel('True')
plt.show()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.