簡體   English   中英

為什么 spaCy Scorer 為實體分數返回 None 但 model 正在提取實體?

[英]Why is the spaCy Scorer returning None for the entity scores but the model is extracting entities?

我真的很困惑為什么 Scorer.score 在下面的示例中將 ents_p、ents_r 和 ents_f 返回為 None。 我看到一些與我自己的自定義 model 相似的東西,想知道為什么它返回無?

計分器代碼示例 - 為 ents_p、ents_r、ents_f 返回 None

import spacy
from spacy.scorer import Scorer
from spacy.tokens import Doc
from spacy.training.example import Example

examples = [
    ('Who is Talha Tayyab?',
     {(7, 19, 'PERSON')}),
    ('I like London and Berlin.',
     {(7, 13, 'LOC'), (18, 24, 'LOC')}),
     ('Agra is famous for Tajmahal, The CEO of Facebook will visit India shortly to meet Murari Mahaseth and to visit Tajmahal.',
     {(0, 4, 'LOC'), (40, 48, 'ORG'), (60, 65, 'GPE'), (82, 97, 'PERSON'), (111, 119, 'GPE')})
]

def my_evaluate(ner_model, examples):
    scorer = Scorer()
    example = []
    for input_, annotations in examples:
        pred = ner_model(input_)
        print(pred,annotations)
        temp = Example.from_dict(pred, dict.fromkeys(annotations))
        example.append(temp)
    scores = scorer.score(example)
    return scores

ner_model = spacy.load('en_core_web_sm') # for spaCy's pretrained use 'en_core_web_sm'
results = my_evaluate(ner_model, examples)
print(results)

得分手結果

{'token_acc': 1.0, 'token_p': 1.0, 'token_r': 1.0, 'token_f': 1.0, 'sents_p': None, 'sents_r': None, 'sents_f': None, 'tag_acc': None, 'pos_acc': None, 'morph_acc': None, 'morph_micro_p': None, 'morph_micro_r': None, 'morph_micro_f': None, 'morph_per_feat': None, 'dep_uas': None, 'dep_las': None, 'dep_las_per_type': None, 'ents_p': None, 'ents_r': None, 'ents_f': None, 'ents_per_type': None, 'cats_score': 0.0, 'cats_score_desc': 'macro F', 'cats_micro_p': 0.0, 'cats_micro_r': 0.0, 'cats_micro_f': 0.0, 'cats_macro_p': 0.0, 'cats_macro_r': 0.0, 'cats_macro_f': 0.0, 'cats_macro_auc': 0.0, 'cats_f_per_type': {}, 'cats_auc_per_type': {}}

它顯然是從文本中挑選出實體

doc = ner_model('Agra is famous for Tajmahal, The CEO of Facebook will visit India shortly to meet Murari Mahaseth and to visit Tajmahal.')
for ent in doc.ents:
    print(ent.text, ent.label_)

Output

Agra PERSON
Tajmahal ORG
Facebook ORG
India GPE
Murari Mahaseth PERSON
Tajmahal ORG

這一行是問題所在,注釋未添加到參考文檔中,因為它們的格式不正確:

Example.from_dict(pred, dict.fromkeys(annotations))

預期格式為:

Example.from_dict(pred, {"entities": [(start, end, label), (start, end, label), ...]})

如果您在Example.predicted未注釋的情況下創建示例,您還可以使用內置的Language.evaluate ,這也會根據您的管道創建評分器,這樣您就不會得到很多不相關的None分數:

Example.from_dict(nlp.make_doc(text), {"entities": [(start, end, label), (start, end, label), ...]})

Once you have these kinds of examples, run:

```python
scores = ner_model.evaluate(examples)

對於那些感興趣的人,這里是與@aab 的輸入相匹配的完整示例。

import spacy
from spacy.scorer import Scorer
from spacy.tokens import Doc
from spacy.training.example import Example

examples = [
    ('Who is Talha Tayyab?',
     {"entities":[(7, 19, 'PERSON')]}),
    ('I like London and Berlin.',
     {"entities":[(7, 13, 'LOC'), (18, 24, 'LOC')]}),
     ('Agra is famous for Tajmahal, The CEO of Facebook will visit India shortly to meet Murari Mahaseth and to visit Tajmahal.',
     {"entities":[(0, 4, 'LOC'), (40, 48, 'ORG'), (60, 65, 'GPE'), (82, 97, 'PERSON'), (111, 119, 'GPE')]})
]

def my_evaluate(ner_model, examples):
    scorer = Scorer()
    example = []
    for input_, annotations in examples:
        pred = ner_model(input_)
        print(pred,annotations)
        temp = Example.from_dict(pred, annotations)
        example.append(temp)
    scores = scorer.score(example)
    return scores

ner_model = spacy.load('en_core_web_sm') # for spaCy's pretrained use 'en_core_web_sm'
results = my_evaluate(ner_model, examples)
print(results)

Output

Who is Talha Tayyab? {'entities': [(7, 19, 'PERSON')]}
I like London and Berlin. {'entities': [(7, 13, 'LOC'), (18, 24, 'LOC')]}
Agra is famous for Tajmahal, The CEO of Facebook will visit India shortly to meet Murari Mahaseth and to visit Tajmahal. {'entities': [(0, 4, 'LOC'), (40, 48, 'ORG'), (60, 65, 'GPE'), (82, 97, 'PERSON'), (111, 119, 'GPE')]}
{'token_acc': 1.0, 'token_p': 1.0, 'token_r': 1.0, 'token_f': 1.0, 'sents_p': None, 'sents_r': None, 'sents_f': None, 'tag_acc': None, 'pos_acc': None, 'morph_acc': None, 'morph_micro_p': None, 'morph_micro_r': None, 'morph_micro_f': None, 'morph_per_feat': None, 'dep_uas': None, 'dep_las': None, 'dep_las_per_type': None, 'ents_p': 0.4444444444444444, 'ents_r': 0.5, 'ents_f': 0.47058823529411764, 'ents_per_type': {'PERSON': {'p': 0.6666666666666666, 'r': 1.0, 'f': 0.8}, 'GPE': {'p': 0.3333333333333333, 'r': 0.5, 'f': 0.4}, 'LOC': {'p': 0.0, 'r': 0.0, 'f': 0.0}, 'ORG': {'p': 0.3333333333333333, 'r': 1.0, 'f': 0.5}}, 'cats_score': 0.0, 'cats_score_desc': 'macro F', 'cats_micro_p': 0.0, 'cats_micro_r': 0.0, 'cats_micro_f': 0.0, 'cats_macro_p': 0.0, 'cats_macro_r': 0.0, 'cats_macro_f': 0.0, 'cats_macro_auc': 0.0, 'cats_f_per_type': {}, 'cats_auc_per_type': {}}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM