[英]Add custom NER to Spacy 3 pipeline
I am trying to build a custom Spacy pipeline based off the en_core_web_sm pipeline.我正在尝试基于 en_core_web_sm 管道构建自定义 Spacy 管道。 From what I can tell the ner has been added correctly as it is displayed in the pipe names when printed(see below).
据我所知,ner 已正确添加,因为它在打印时显示在 pipe 名称中(见下文)。 For some reason when the model is tested on text I am not getting any results but when the custom ner is used by itself the correct entities are extracted and labelled.
出于某种原因,当 model 在文本上进行测试时,我没有得到任何结果,但是当自定义 ner 被自己使用时,正确的实体被提取和标记。 I am using Spacy 3.0.8 and en_core_web_sm pipeline 3.0.0.
我正在使用 Spacy 3.0.8 和 en_core_web_sm 管道 3.0.0。
import spacy
crypto_nlp = spacy.load('model-best')
nlp = spacy.load('en_core_web_sm')
nlp.add_pipe('ner', source=crypto_nlp, name="crypto_ner", before="ner")
print(nlp.pipe_names)
text = 'Ethereum'
doc = nlp(text)
for ent in doc.ents:
print(ent.text, ent.label_)
Output: '['tok2vec', 'tagger', 'parser', 'crypto_ner', 'ner', 'attribute_ruler', 'lemmatizer']' Output: '['tok2vec', 'tagger', 'parser', 'crypto_ner', 'ner', 'attribute_ruler', 'lemmatizer']'
But when I use my ner model:但是当我使用我的 ner model 时:
doc = crypto_nlp(text)
for ent in doc.ents:
print(ent.text, ent.label_)
Output: 'Ethereum ETH' Output:'以太坊 ETH'
It's not clear from the details in the question, but my guess is that your crypto_nlp
ner
depends on a separate tok2vec
component that's not being included when you source.从问题的细节中不清楚,但我的猜测是您的
crypto_nlp
ner
依赖于一个单独的tok2vec
组件,该组件在您采购时未包含在内。
Since this tok2vec
won't be shared, it's easiest to modify the ner
component to include a standalone copy of the tok2vec
, which is called "replacing listeners": https://spacy.io/api/language#replace_listeners由于此
tok2vec
不会共享,因此最简单的方法是修改ner
组件以包含tok2vec
的独立副本,称为“替换侦听器”: https://spacy.io/api/language#replace_listeners
If crypto_nlp
has nlp.pipe_names
as ['tok2vec', 'ner']
, then this should replace the listener before loading it into the second pipeline, so it's now a standalone component:如果
crypto_nlp
有nlp.pipe_names
作为['tok2vec', 'ner']
,那么这应该在加载到第二个管道之前替换监听器,所以它现在是一个独立的组件:
crypto_nlp.replace_listeners("tok2vec", "ner", ["model.tok2vec"])
nlp.add_pipe('ner', source=crypto_nlp, name="crypto_ner", before="ner")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.