简体   繁体   English

使用 spacy python 的自定义 NER 需要多少训练数据(句子)?[只是粗略的想法]

[英]How many training data(sentences) are required for custom NER using spacy python?[Just rought idea]

I want to know let's say I have 10 custom entities to recognize how much annotated training sentences should I give (Any rough idea) ??我想知道假设我有 10 个自定义实体来识别我应该给出多少带注释的训练句子(任何粗略的想法)?

Thank You, in Advance!!先感谢您!! :) :)

I am new to this, please help我是新手,请帮忙

For developing custom ner model at least 50-100 occurrences of each entity will be required along with their proper context.为了开发自定义的 ner 模型,每个实体至少需要 50-100 次出现以及它们的适当上下文。 Otherwise if you have less data than your custom model will overfit on that.否则,如果您的数据少于自定义模型,则会过度拟合。 So, depending upon your data you will require atleast 200 to 300 sentences.因此,根据您的数据,您将需要至少 200 到 300 个句子。

For the custom NER model from Spacy, you will definitely require around 100 samples for each entity that too without any biases in your dataset.对于来自 Spacy 的自定义 NER 模型,每个实体肯定需要大约 100 个样本,并且在你的数据集中也没有任何偏差。

All this is as per my experience.这一切都是根据我的经验。

Suggestion -: Spacy Custom model you can explore, but for production level or some good project, you can't be totally dependent on that only, You have to do some NLP/ Relation Extraction, etc. along with this.建议-:Spacy Custom 模型你可以探索,但是对于生产级别或一些好的项目,你不能完全依赖它,你必须同时做一些NLP/关系提取等。

Hope this helps.希望这可以帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM