在 Spacy NER 中进行评估 model

Question

I am trying to evaluate a trained NER Model created using spacy lib .我正在尝试评估使用spacy lib创建的训练有素的 NER Model。 Normally for these kind of problems you can use f1 score (a ratio between precision and recall).通常对于这类问题，您可以使用 f1 分数（精度与召回率之间的比率）。 I could not find in the documentation an accuracy function for a trained NER model.我在文档中找不到经过训练的 NER model 的准确度 function。

I am not sure if it's correct but I am trying to do it with the following way(example) and using f1_score from sklearn :我不确定它是否正确，但我正在尝试通过以下方式（示例）并使用f1_score中的sklearn来做到这一点：

from sklearn.metrics import f1_score
import spacy
from spacy.gold import GoldParse


nlp = spacy.load("en") #load NER model
test_text = "my name is John" # text to test accuracy
doc_to_test = nlp(test_text) # transform the text to spacy doc format

# we create a golden doc where we know the tagged entity for the text to be tested
doc_gold_text= nlp.make_doc(test_text)
entity_offsets_of_gold_text = [(11, 15,"PERSON")]
gold = GoldParse(doc_gold_text, entities=entity_offsets_of_gold_text)

# bring the data in a format acceptable for sklearn f1 function
y_true = ["PERSON" if "PERSON" in x else 'O' for x in gold.ner]
y_predicted = [x.ent_type_ if x.ent_type_ !='' else 'O' for x in doc_to_test]
f1_score(y_true, y_predicted, average='macro')`[1]
> 1.0

Any thoughts are or insights are useful.任何想法或见解都是有用的。

Answer 1

You can find different metrics including F-score, recall and precision in spaCy/scorer.py .您可以在spaCy/scorer.py中找到不同的指标，包括 F 分数、召回率和精度。

This example shows how you can use it:这个例子展示了如何使用它：

import spacy
from spacy.gold import GoldParse
from spacy.scorer import Scorer

def evaluate(ner_model, examples):
    scorer = Scorer()
    for input_, annot in examples:
        doc_gold_text = ner_model.make_doc(input_)
        gold = GoldParse(doc_gold_text, entities=annot)
        pred_value = ner_model(input_)
        scorer.score(pred_value, gold)
    return scorer.scores

# example run

examples = [
    ('Who is Shaka Khan?',
     [(7, 17, 'PERSON')]),
    ('I like London and Berlin.',
     [(7, 13, 'LOC'), (18, 24, 'LOC')])
]

ner_model = spacy.load(ner_model_path) # for spaCy's pretrained use 'en_core_web_sm'
results = evaluate(ner_model, examples)

The scorer.scores returns multiple scores. scorer.scores返回多个分数。 When running the example, the result looks like this: (Note the low scores occuring because the examples classify London and Berlin as 'LOC' while the model classifies them as 'GPE'. You can figure this out by looking at the ents_per_type .)运行示例时，结果如下所示：（请注意，出现低分是因为示例将伦敦和柏林分类为“LOC”，而模型将它们分类为“GPE”。您可以通过查看ents_per_type来确定这一点。）

{'uas': 0.0, 'las': 0.0, 'las_per_type': {'attr': {'p': 0.0, 'r': 0.0, 'f': 0.0}, 'root': {'p': 0.0, 'r': 0.0, 'f': 0.0}, 'compound': {'p': 0.0, 'r': 0.0, 'f': 0.0}, 'nsubj': {'p': 0.0, 'r': 0.0, 'f': 0.0}, 'dobj': {'p': 0.0, 'r': 0.0, 'f': 0.0}, 'cc': {'p': 0.0, 'r': 0.0, 'f': 0.0}, 'conj': {'p': 0.0, 'r': 0.0, 'f': 0.0}}, 'ents_p': 33.33333333333333, 'ents_r': 33.33333333333333, 'ents_f': 33.33333333333333, 'ents_per_type': {'PERSON': {'p': 100.0, 'r': 100.0, 'f': 100.0}, 'LOC': {'p': 0.0, 'r': 0.0, 'f': 0.0}, 'GPE': {'p': 0.0, 'r': 0.0, 'f': 0.0}}, 'tags_acc': 0.0, 'token_acc': 100.0, 'textcat_score': 0.0, 'textcats_per_cat': {}}

The example is taken from a spaCy example on github (link does not work anymore).该示例取自github 上的 spaCy 示例（链接不再起作用）。 It was last tested with spacy 2.2.4.它最后使用 spacy 2.2.4 进行了测试。

Answer 2

since i faced the same problem, i am going to post here the code for the example showed in the accepted answer, but for spacy V3:因为我遇到了同样的问题，所以我将在此处发布已接受答案中显示的示例代码，但对于 spacy V3：

import spacy
from spacy.scorer import Scorer
from spacy.tokens import Doc
from spacy.training.example import Example

examples = [
    ('Who is Shaka Khan?',
     {(7, 17, 'PERSON')}),
    ('I like London and Berlin.',
     {(7, 13, 'LOC'), (18, 24, 'LOC')})
]

def evaluate(ner_model, examples):
    scorer = Scorer()
    example = []
    for input_, annot in examples:
        pred = ner_model(input_)
        print(pred,annot)
        temp = Example.from_dict(pred, dict.fromkeys(annot))
        example.append(temp)
    scores = scorer.score(example)
    return scores

ner_model = spacy.load('en_core_web_sm') # for spaCy's pretrained use 'en_core_web_sm'
results = evaluate(ner_model, examples)
print(results)

Breaking changes ocurred because libraries such as goldParse deprecated由于不推荐使用诸如 goldParse 之类的库，因此发生了重大更改

I believe the part of the answer about metrics is still valid我相信关于指标的部分答案仍然有效

Answer 3

请注意，在 spaCy v3 中有一个evaluate命令，您可以从命令行轻松使用，而不是编写自定义代码来处理事情。

Answer 4

This is how I used to calculate accuracy for my Spacy's Custom NER model这就是我用来计算 Spacy 的自定义 NER 模型准确性的方法

def flat_accuracy(text, annotations):
    actual_ents = [ents[2] for ents in annotations]
    prediction = nlp_ner(text)
    pred_ents = [ent.text for ent in prediction.ents]
    return 1 if actual_ents == pred_ents else 0


predict_points = sum(flat_accuracy(test_text[0], test_text[1]) for test_text in examples)
output = (predict_points/len(examples)) * 100
output --> 82%

Answer 5

I searched for many solutions on the inte.net but failed to find any working solution.我在 inte.net 上搜索了许多解决方案，但没有找到任何有效的解决方案。 Now that I was able to figure out the root of the problem, I am sharing my code, similar to the original question.现在我能够找出问题的根源，我正在分享我的代码，类似于原始问题。 I hope someone can still find it useful.我希望有人仍然可以发现它有用。 It works with SpaCy V3.3.它适用于 SpaCy V3.3。

from spacy.scorer import Scorer
from spacy.training import Example

def evaluate(ner_model, samples):
    scorer = Scorer(ner_model)
    example = []
    for sample in samples:
        pred = ner_model(sample['text'])
        print(pred, sample['entities'])
        temp_ex = Example.from_dict(pred, {'entities': sample['entities']})
        example.append(temp_ex)
    scores = scorer.score(example)
    
    return scores

Note: samples should be a valid spacy v3 formatted JSON data like below:注意：样本应该是有效的 spacy v3 格式的 JSON 数据，如下所示：

{'text': '#Causes - Quinsy - CA0K.1\nPeri Tonsillar Abscess is usually a complication of an untreated or partially treated acute tonsillitis. The infection, in these cases, spreads to the peritonsillar area (peritonsillitis). This region comprises loose connective tissue and is hence susceptible to formation of abscess.', 'entities': [(10, 16, 'Disease_E'), (26, 48, 'Disease_E'), (112, 129, 'Complication_E'), (177, 213, 'Anatomy_E'), (237, 260, 'Anatomy_E'), (302, 309, 'Disease_E')]}

在 Spacy NER 中进行评估 model

问题描述

5 个解决方案

解决方案1
37 已采纳 2017-06-30 07:59:48

解决方案2
3 2021-07-12 10:47:01

解决方案3
1 2021-07-12 10:54:00

解决方案4
0 2022-07-10 16:56:56

解决方案5
0 2023-01-20 00:49:54

在 Spacy NER 中进行评估 model

问题描述

5 个解决方案

解决方案1 37 已采纳 2017-06-30 07:59:48

解决方案2 3 2021-07-12 10:47:01

解决方案3 1 2021-07-12 10:54:00

解决方案4 0 2022-07-10 16:56:56

解决方案5 0 2023-01-20 00:49:54

解决方案1
37 已采纳 2017-06-30 07:59:48

解决方案2
3 2021-07-12 10:47:01

解决方案3
1 2021-07-12 10:54:00

解决方案4
0 2022-07-10 16:56:56

解决方案5
0 2023-01-20 00:49:54