简体   繁体   English

将自定义 NER model 添加到 spaCy 管道

[英]Add custom NER model to spaCy pipeline

I created a custom NER model using Prodi.gy.我使用 Prodi.gy 创建了一个自定义 NER model。 I saved the model to disk, once I performed all of the processing and validations.执行所有处理和验证后,我将 model 保存到磁盘。 I can instantiate the model from disk using spacy.load and it seems to work well.我可以使用 spacy.load 从磁盘实例化 model,它似乎运行良好。 My question now is how do I add that custom NER model to a spacy pipeline?我现在的问题是如何将自定义 NER model 添加到 spacy 管道中? I want to make sure I have the tagger, parser, etc. in the pipeline plus my custom NER model.我想确保管道中有标记器、解析器等以及我的自定义 NER model。

It seems like I should initialize a base nlp from one of the existing models (en_core_web_sm), remove the existing NER, and replace it with my custom NER.似乎我应该从现有模型之一 (en_core_web_sm) 初始化一个基数 nlp,删除现有的 NER,并将其替换为我的自定义 NER。 This is no doubt user error, I just can't seem to figure out from the documentation and trial/error what I am doing wrong (or need to do).这无疑是用户错误,我似乎无法从文档和试验/错误中弄清楚我做错了什么(或需要做什么)。

Maybe my operations are wrong?也许我的操作有误? Maybe I should try to add the tagger and parser to my custom model instantiation?也许我应该尝试将标记器和解析器添加到我的自定义 model 实例中?

I was able to get it to work by adding the "tagged" and "parser" from one of the en models and then modifying the meta.json file. That doesn't seem like the right approach. 

I tried this obviously not right:我试过这个显然不对:

nlp = spacy.load("en_core_web_sm")
#remove existing NER
nlp.remove_pipe('ner')
print("Pipeline", nlp.pipe_names)

nlp_entity = spacy.load("custom_ner_model")

nlp.add_pipe(nlp_entity)
print("Pipeline", nlp.pipe_names)

Pipeline ['tagger', 'parser']
Pipeline ['tagger', 'parser', 'English']

I then tried this to build the NER from the custom model and add it and also not right:然后我尝试从自定义 model 构建 NER 并添加它,但也不正确:

nlp = spacy.load("en_core_web_sm")
#remove existing NER
nlp.remove_pipe('ner')
print("Pipeline", nlp.pipe_names)

nlp_entity = spacy.load("custom_ner_model")
ner = nlp_entity.create_pipe("ner")

nlp.add_pipe(ner,last=True)
print("Pipeline", nlp.pipe_names)

Error if I try to run with ner in pipeline:如果我尝试在管道中使用 ner 运行时出错:

text = "This is a test"
doc = nlp(text)
displacy.render(doc, style="ent")

ValueError: [E109] Model for component 'ner' not initialized. Did you forget to load a model, or forget to call begin_training()?

Also got this error, which is what drove me to try adding tagger/parser from the base en models也遇到了这个错误,这就是促使我尝试从基本 en 模型添加标记器/解析器的原因

ValueError: [E155] The pipeline needs to include a tagger in order to use Matcher or PhraseMatcher with the attributes POS, TAG, or LEMMA. Try using nlp() instead of nlp.make_doc() or list(nlp.pipe()) instead of list(nlp.tokenizer.pipe()).

In spaCy v2:在 spaCy v2 中:

nlp = spacy.load("en_core_web_sm", disable=["ner"])
nlp_entity = spacy.load("custom_ner_model", vocab=nlp.vocab)
nlp.add_pipe(nlp_entity.get_pipe("ner"))

The tricky part here is that you need to load both with the same vocab so your final model knows about the strings for any new labels used only in the custom model. To do this, you just need to provide the vocab object from from the first model to spacy.load() for the second model.这里棘手的部分是您需要使用相同的词汇加载两者,以便您的最终 model 知道仅在自定义 model 中使用的任何新标签的字符串。为此,您只需要从第一个开始提供词汇 object model 到spacy.load()为第二个 model。

For the upcoming spaCy v3, this will change to:对于即将推出的 spaCy v3,这将更改为:

nlp = spacy.load("en_core_web_sm", exclude=["ner"])
nlp_entity = spacy.load("custom_ner_model")
nlp.add_pipe("ner", source=nlp_entity)

The spacy folks provided this as a response, which is similar to @aab's answer. spacy 的人提供了这个作为回应,这类似于@aab 的回答。

You can either train off of a base model and remove ner:您可以训练基地 model 并删除 ner:

nlp = spacy.load("en_core_web_sm")
nlp.remove_pipe("ner")
print(nlp.pipe_names)  # ['tagger', 'parser']
nlp.to_disk("./en_tagger_parser_sm")  # use that path for training

Or you can remove NER from the base model and add your custom NER to that base:或者您可以从基础 model 中删除 NER,并将您的自定义 NER 添加到该基础:

nlp = spacy.load("en_core_web_sm")
nlp.remove_pipe("ner")
print(nlp.pipe_names)  # ['tagger', 'parser']

nlp_entity = spacy.load("custom_ner_model")
# Get the ner pipe from this model and add it to base model
ner = nlp_entity.get_pipe("ner")
nlp.add_pipe(ner)
print(nlp.pipe_names)  # ['tagger', 'parser', 'ner']

nlp.to_disk("./custom_model")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM