简体   繁体   English

SpaCy NER:同一个词可以成为两个不同实体的一部分吗?

[英]SpaCy NER: Can a same word be part of two different entities?

For example: 例如:

Sentence: The best product in the world is Nestle Cookies. 句子:世界上最好的产品是雀巢饼干。

Entities: 实体:

BRAND: Nestle 品牌:雀巢

PRODUCT: Nestle Cookie 产品:雀巢饼干

Are the above entities valid, or should I tag them as: 上述实体是否有效,或者我应该将它们标记为:

Entities: 实体:

BRAND: Nestle 品牌:雀巢

PRODUCT: Cookie 产品:Cookie

And will it affect model performance? 它会影响模型性能吗?

From the documentation : 文档

The entity recognizer is constrained to predict only non-overlapping, non-nested spans. 实体识别器被约束为仅预测非重叠的非嵌套跨度。 The training data should obey the same constraint. 训练数据应遵循相同的约束条件。 If you like, you could have two sentences with the different annotations in your data. 如果您愿意,您可以在数据中使用两个带有不同注释的句子。 I'm not sure whether this would hurt or help your performance, though. 不过,我不确定这是否会伤害或帮助你的表现。

If you want spaCy to learn to recover both annotations, you could have two EntityRecognizer instances in the pipeline. 如果您希望spaCy学习恢复两个注释,您可以在管道中有两个EntityRecognizer实例。 You would need to move the entity annotations into an extension attribute, because you don't want the second entity recogniser to overwrite the entities set by the first one. 您需要将实体注释移动到扩展属性中,因为您不希望第二个实体识别器覆盖第一个实体设置的实体。

Consequence: 后果:

If you want to have a single NER tagger you must label as follows: 如果您想要一个NER标记器,则必须标记如下:
Entities: BRAND: Nestle PRODUCT: Cookie 实体:品牌:雀巢产品:Cookie

If you want to train two separate NER taggers (one for BRAND and one for PRODUCT) then you can do: 如果你想训练两个单独的NER标签(一个用于BRAND,一个用于PRODUCT),那么你可以:
Entities: BRAND: Nestle PRODUCT: Nestle Cookie 实体:品牌:雀巢产品:雀巢饼干

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM