简体   繁体   English

将自定义 NER 添加到 Spacy 3 管道

[英]Add custom NER to Spacy 3 pipeline

I am trying to build a custom Spacy pipeline based off the en_core_web_sm pipeline.我正在尝试基于 en_core_web_sm 管道构建自定义 Spacy 管道。 From what I can tell the ner has been added correctly as it is displayed in the pipe names when printed(see below).据我所知,ner 已正确添加,因为它在打印时显示在 pipe 名称中(见下文)。 For some reason when the model is tested on text I am not getting any results but when the custom ner is used by itself the correct entities are extracted and labelled.出于某种原因,当 model 在文本上进行测试时,我没有得到任何结果,但是当自定义 ner 被自己使用时,正确的实体被提取和标记。 I am using Spacy 3.0.8 and en_core_web_sm pipeline 3.0.0.我正在使用 Spacy 3.0.8 和 en_core_web_sm 管道 3.0.0。

import spacy


crypto_nlp = spacy.load('model-best')
nlp = spacy.load('en_core_web_sm')

nlp.add_pipe('ner', source=crypto_nlp, name="crypto_ner", before="ner")

print(nlp.pipe_names)

text = 'Ethereum'

doc = nlp(text)
for ent in doc.ents:
    print(ent.text, ent.label_)

Output: '['tok2vec', 'tagger', 'parser', 'crypto_ner', 'ner', 'attribute_ruler', 'lemmatizer']' Output: '['tok2vec', 'tagger', 'parser', 'crypto_ner', 'ner', 'attribute_ruler', 'lemmatizer']'

But when I use my ner model:但是当我使用我的 ner model 时:

doc = crypto_nlp(text)
for ent in doc.ents:
    print(ent.text, ent.label_)

Output: 'Ethereum ETH' Output:'以太坊 ETH'

It's not clear from the details in the question, but my guess is that your crypto_nlp ner depends on a separate tok2vec component that's not being included when you source.从问题的细节中不清楚,但我的猜测是您的crypto_nlp ner依赖于一个单独的tok2vec组件,该组件在您采购时未包含在内。

Since this tok2vec won't be shared, it's easiest to modify the ner component to include a standalone copy of the tok2vec , which is called "replacing listeners": https://spacy.io/api/language#replace_listeners由于此tok2vec不会共享,因此最简单的方法是修改ner组件以包含tok2vec的独立副本,称为“替换侦听器”: https://spacy.io/api/language#replace_listeners

If crypto_nlp has nlp.pipe_names as ['tok2vec', 'ner'] , then this should replace the listener before loading it into the second pipeline, so it's now a standalone component:如果crypto_nlpnlp.pipe_names作为['tok2vec', 'ner'] ,那么这应该在加载到第二个管道之前替换监听器,所以它现在是一个独立的组件:

crypto_nlp.replace_listeners("tok2vec", "ner", ["model.tok2vec"])
nlp.add_pipe('ner', source=crypto_nlp, name="crypto_ner", before="ner")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM