简体   繁体   English

如何让 SpaCy 识别所有给定的实体

[英]How can I make SpaCy recognize all my given entities

I have quite a list of patterns in JSONL format that I loaded and added to the entity ruler我有很多 JSONL 格式的模式列表,我加载并添加到实体标尺

new_ruler = EntityRuler(nlp).from_disk(project_path + "data/skill_patterns.jsonl")
nlp.add_pipe(new_ruler)

When I print the results: print([(ent.text, ent.label_) for ent in doc.ents]) My output is:当我打印结果时: print([(ent.text, ent.label_) for ent in doc.ents])我的输出是:

[('data science','SKILL|data-science'), ('CV', 'ORG'), ('Kandidaat', 'FAC'), ('één', 'CARDINAL'), ('LSTM',
 'ORG'), ('Parts', 'GPE'), ('Speech', 'GPE'), ('POS', 'ORG'), ('Entity Recognition', 'ORG'), 
('NER', 'ORG'), ('Word2vec', 'ORG'), ('GloVe', 'ORG'), ('Recursive', 'NORP'), ('Neural Networks', 'ORG'),
 ('Ensemble', 'PERSON'), ('Dynamic', 'NORP'), ('Intent detection', 'PERSON'), ('Phrase matching.-', 'ORG'),
 ('Microsoft', 'NORP'), ('Azure.-', 'ORG'), ('één', 'CARDINAL'), ('Python', 'WORK_OF_ART'),
 ('Pytorch', 'GPE'), ('Django', 'GPE'), ('GoLanguage.-', 'GPE'), ('Kandidaat', 'FAC'), ('1 november 2020', 'DATE')]

Now I know for a fact that for example ('Pytorch', 'GPE') or ('Django', 'GPE') are in my pattern list and should be recognized as SKILL instead of the entities they got assigned now.现在我知道一个事实,例如('Pytorch', 'GPE')('Django', 'GPE')在我的模式列表中,应该被识别为SKILL而不是他们现在分配的实体。 This goes for quite a few other 'skills' as well.这也适用于许多其他“技能”。

{"label":"SKILL|django","pattern":[{"LOWER":"django"}]}
{"label":"SKILL|pytorch","pattern":[{"LOWER":"pytorch"}]}

Is there someone that knows why it does not adhere to my self created entities?有没有人知道为什么它不遵守我自己创建的实体?

Is there a way that I can prioritize my entities above the ones already in the model?有没有办法让我的实体优先于模型中已有的实体?

Thanks!谢谢!

I've found a solution.我找到了解决办法。

By adding the new_ruler before the NER (after parser) in the pipeline, it gives the created entities priority通过在new_ruler的 NER(解析器之后)之前添加new_ruler ,它赋予创建的实体优先级

nlp.add_pipe(new_ruler, after='parser')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM