简体繁体 English

使用Spacy NER训练多词动词和名词实体

[英]Training multi word verb and noun entities with Spacy NER

原文 2018-10-28 18:38:09 4 1 spacy

All NER training instances I have come across are nouns, but is it possible to train entities with Spacy NER that are verb and noun combinations. 我遇到的所有NER训练实例都是名词，但是可以使用Spacy NER训练动词和名词组合的实体。 For example 'stirring pot'. 例如“搅拌锅”。

Do i use a noun based NER first and then train a nested NER on such phrases or do i directly go for training the phrase in Spacy NER. 我先使用基于名词的NER，然后在此类短语上训练嵌套的NER还是直接在Spacy NER中去训练该短语。 I guess the answer will depend on if Spacy NER uses POS and dependency features as part of its training. 我想答案将取决于Spacy NER是否将POS和依赖功能用作其培训的一部分。

1 个解决方案

NER technologies usually work best when the entities are fairly short, and when there are clear clues at the starts and ends of the phrases. 当实体相当短，并且短语的开头和结尾有明确的线索时，NER技术通常最有效。 These are both the case for recognising proper nouns in English, which is the canonical use-case the algorithms were developed for. 这两种情况都是在英语中识别专有名词的情况，这是算法开发所依据的规范用例。

A noun phrase like "stepping stone" or "deciding factor" will be easy for an NER system to learn. 对于NER系统而言，像“踏脚石”或“决定因素”这样的名词短语将很容易学习。 The system would be less good at recognising verb + object constructions, as the verb and object might be arbitrarily far apart, eg stirring the pot, stirring the metal pot, stir the pot vigorously, etc. You should also be a bit wary of applying sequential labellings to arbitrary spans of text, that aren't syntactic constituents. 该系统在识别动词和宾语的构造方面不太好，因为动词和宾语可能会任意间隔开，例如，搅拌锅，搅拌金属锅，剧烈搅拌锅等。您还应谨慎使用顺序标注到任意跨度的文本，不是语法成分。 It'll be very difficult to describe where the boundary of the phrases should fall, so your annotators probably won't behave consistently. 描述短语的边界应该落在哪里将非常困难，因此注释者可能不会表现出一致的行为。 Indecision about the exact boundaries of the phrases will make the NER system perform very poorly, because spans which differ by one word are seen as entirely different spans by the loss function. 对短语的确切边界的犹豫不决将使NER系统的性能非常差，因为损失函数将一个单词不同的跨度视为完全不同的跨度。

Finally, to answer your question about the POS and dependency parsing features: no, we don't use these in the NER at the moment. 最后，回答您有关POS和依赖关系解析功能的问题：不，我们目前不在NER中使用这些功能。

You might be interested in the dependency tree matcher contributed in these two pull requests: 您可能对以下两个拉取请求中贡献的依赖关系树匹配器感兴趣：