简体   繁体   English

新训练的 spaCy NER 模型中没有 POS 标签,如何启用?

[英]No POS tags in newly trained spaCy NER model, how to enable?

I trained a NER model following the spaCy Training Quickstart and only enabled the ner pipeline for training since it is the only data I have.我按照spaCy 训练快速入门训练了一个 NER 模型,并且只启用了ner管道进行训练,因为它是我拥有的唯一数据。

Here is the partial config这是部分config

[nlp]
lang = "en"
pipeline = ["tok2vec","ner","tagger"]
batch_size = 1000
disabled = []
before_creation = null
after_creation = null
after_pipeline_creation = null
tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}
...
[components.tagger]
source = "en_core_web_sm"
component = "tagger"
replace_listeners = ["model.tok2vec"]
...
[training]
...
frozen_components = ["tagger"]

Now when I get entity predictions, there are no POS tags.现在,当我得到实体预测时,没有POS标签。

For example, a ent in doc.ents will have no pos_ on the tokens.例如, doc.ents中的ent在标记上没有pos_

>>> ent
Some Entity
>>> ent.label_
'LABEL_NAME'
>>> [token.pos_ for token in ent]
['', '']

So how do I only train the ner pipeline and still allow POS tags to be predicted with the tagger ?那么我如何只训练ner管道并且仍然允许使用tagger预测POS标签呢?

Is there a way to load the POS tag predictions from another model such as using the en_core_web_sm for the tagger and using my trained model for the ner ?有没有办法从另一个模型加载POS标记预测,例如使用en_core_web_sm作为tagger并将我训练的模型用于ner

I am trying to use the frozen_components but it does not seem to work.我正在尝试使用frozen_components ,但它似乎不起作用。

Yes, you can "source" a component from a different pipeline.是的,您可以从不同的管道“获取”组件。 See thesourcing components docs for general information about that, or the double NER project for an example of doing it with two NER components.有关这方面的一般信息,请参阅采购组件文档,或查看双 NER 项目以获取使用两个 NER 组件执行此操作的示例。

Basically you can do this:基本上你可以这样做:

import spacy

nlp = spacy.load("my_ner")
nlp_tagger = spacy.load("en_core_web_sm") # load the base pipeline
# give this component a copy of its own tok2vec
nlp_tagger.replace_listeners("tok2vec", "tagger", ["model.tok2vec"])

nlp.add_pipe(
    "tagger",
    name="tagger",
    source=nlp_tagger,
    after="ner",
)

Note that both pipelines need to have the same word vectors or this won't work, as described in the sourcing components docs.请注意,两个管道需要具有相同的词向量,否则这将不起作用,如采购组件文档中所述。 In this case the sm model has no word vectors, so it will work if your pipeline also has no word vectors, for example.在这种情况下, sm模型没有词向量,因此例如,如果您的管道也没有词向量,它将起作用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM