简体   繁体   English

使用 spaCy 训练 NER 时损失平稳

[英]Plateuing in loss while training NER with spaCy

I'm training (from scratch) a new set of entities and doing exactly as described in spaCy tutorial , however, my loss is plateauing and large number of epochs does not help.我正在训练(从头开始)一组新的实体,并且完全按照spaCy 教程中的描述进行操作,但是,我的损失处于平稳状态,并且大量的 epoch 无济于事。

My data:我的数据:

9 different entities, 15000 training data (sentences). 9 个不同的实体,15000 个训练数据(句子)。 The loss after 20 epochs: 20 epoch 后的损失:

Loaded model 'en'
Losses {'ner': 25461.3508122763}
Losses {'ner': 17003.450728844182}
Losses {'ner': 15725.198527784352}
Losses {'ner': 15315.754479839785}
Losses {'ner': 14980.468680851985}
Losses {'ner': 14716.52629194191}
Losses {'ner': 14346.623731715972}
Losses {'ner': 14463.972966984807}
Losses {'ner': 14195.106732198006}
Losses {'ner': 14058.390174787504}
Losses {'ner': 13875.850727875884}
Losses {'ner': 13859.096326599261}
Losses {'ner': 13614.887464660655}
Losses {'ner': 13512.779816124807}
Losses {'ner': 13388.69595626908}
Losses {'ner': 13496.388241585315}
Losses {'ner': 13530.602194116611}
Losses {'ner': 13245.709490846923}
Losses {'ner': 13219.483523900466}
Losses {'ner': 13189.088232180386}

Question 1:问题 1:

What is the best way to organise the training data if there are several entities within a single sentence?如果一个句子中有多个实体,组织训练数据的最佳方法是什么? Should I combine all entities in a list or it is better to train with a single entity?我应该将所有实体组合在一个列表中,还是最好使用单个实体进行训练?

For example:例如:

("Horses and dogs are too tall and they pretend to care about your feelings", {'entities': [(0, 6, 'ANIMAL'), (11, 15, 'ANIMAL')]})

or it is better to split:或者最好拆分:

("Horses and dogs are too tall and they pretend to care about your feelings", {'entities': [(0, 6, 'ANIMAL')]}),

("Horses and dogs are too tall and they pretend to care about your feelings", {'entities': [(11, 15, 'ANIMAL')]})

Question 2:问题2:

Should I include empty sentences too (with no entities)?我也应该包括空句(没有实体)吗?

("The new electric cars is great!", {'entities': []})

Apparently, the model predicts not too bad (f1~0.7), however I am wondering what are the best practices to fine tune the model (apart from using the Prodigy on top of this trained model).显然,该模型的预测还不错(f1~0.7),但是我想知道微调模型的最佳实践是什么(除了在这个训练有素的模型上使用 Prodigy 之外)。

spaCy and Prodigy expect different forms of training data: spaCy expects a "gold" annotation, in which every entity is labeled. spaCy 和 Prodigy 需要不同形式的训练数据:spaCy 需要“黄金”注释,其中每个实体都被标记。 That annotation format is described in the spaCy docs . spaCy 文档中描述了该注释格式。 If you're just training an NER model, you can simply omit the dependency and POS keys from the dictionary.如果您只是在训练 NER 模型,则可以简单地从字典中省略依赖项和 POS 键。 Training in this way make sense: at prediction time, the model will need to produce entity labels for every word it sees.以这种方式训练是有意义的:在预测时,模型需要为它看到的每个单词生成实体标签。

Prodigy, in contrast, can accept labeled examples that just have a single span with a proposed entity label, plus a human decision of whether that span is an instance of the entity label or not.相比之下,Prodigy 可以接受只有单个跨度和提议的实体标签的标记示例,以及该跨度是否是实体标签的实例的人工决定。 This is a little trickier for training, since the model just won't know whether the other words in the sentence are or are not an entity.这对于训练来说有点棘手,因为模型只是不知道句子中的其他单词是否是实体。

My hunch is that the model will work better if you consolidate all the entities in a sentence into one training example (Question 1).我的预感是,如果您将句子中的所有实体合并为一个训练示例(问题 1),该模型会更好地工作。 This gives the model more information about the sentence and allows it to learn the relationship between different entities in text.这为模型提供了更多关于句子的信息,并允许它学习文本中不同实体之间的关系。 (Think, for example of a phrase "she visited X and Y". If X is a place, Y is almost certainly a place. If X is a person, Y is also likely to be). (例如,“她访问了 X 和 Y”这个短语。如果 X 是一个地方,Y 几乎肯定是一个地方。如果 X 是一个人,Y 也很可能是)。 This would be something pretty easy and interesting to check empirically, though.不过,从经验上检查这将是一件非常容易和有趣的事情。

In regard to question 2, including sentences with no entities should be very helpful for the model.关于问题 2,包括没有实体的句子应该对模型非常有帮助。

Side note: when I'm training NER models, the performance usually plateaus after about 20 epochs, and an F1 of 0.7 isn't too bad, so what you're finding sounds about right.旁注:当我训练 NER 模型时,性能通常在大约 20 个 epoch 后趋于稳定,并且 F1 为 0.7 还不错,所以你发现的听起来是对的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM