简体繁体 English

我应该为每个批次使用相同的纪元吗？

[英]Should i use the same epochs for each batch?

原文 2019-04-15 21:43:47 1 2 python/ machine-learning/ deep-learning/ nlp/ spacy

i need to understand how the epochs/iterations affect the training of a deep learning model. 我需要了解时代/迭代如何影响深度学习模型的训练。

I am training a NER model with Spacy 2.1.3, my documents are very long so i cannot train more than 200 documents per iteration. 我正在使用Spacy 2.1.3训练NER模型，我的文档很长，因此每次迭代不能训练200个以上的文档。 So basically i do 所以基本上我会

from the document 0 to the document 200 -> 20 epochs 从文档0到文档200-> 20个纪元

from the document 201 to the document 400 -> 20 epochs 从文档201到文档400-> 20个纪元

and so on. 等等。

Maybe, it is a stupid question but, should the epochs of the next batches be the same as the first 0-200? 也许这是一个愚蠢的问题，但是下一批的时期应该与前0-200的时期相同吗？ so if i chose 20 epochs i must train the next with 20 epochs too? 所以如果我选择20个时期，我也必须训练20个时期吗？

Thanks 谢谢

2 个解决方案

i need to understand how the epochs/iterations affect the training of a deep learning model - nobody is sure about that one. i need to understand how the epochs/iterations affect the training of a deep learning model -没有人能确定那个i need to understand how the epochs/iterations affect the training of a deep learning model 。 You may overfit after certain amount of epochs, you should check you accuracy (or other metrics) on validation dataset. 在经过一定数量的时期后，您可能会过度拟合，因此应在验证数据集上检查准确性（或其他指标）。 Techniques like Early Stopping are often employed in order to battle this. 为了解决这个问题，经常使用诸如Early Stopping之类的技术。

so i cannot train more than 200 documents per iteration. - do you mean a batch of examples? -您是说一批例子吗？ If so, it should be smaller (too much information in single iteration and too costly). 如果是这样，它应该更小（单次迭代中的信息太多且成本太高）。 32 is usually used for textual data, up to 64 . 文本数据通常使用32 ，最大为64 。 Batch sizes are often smaller the more epochs you train in order to get into the minimum better (or to escape saddle points). 批次的数量通常越小，您训练的时间越长，以便越好越好（或避开鞍点）。

Furthermore, you should use Python's generators so you can iterate over data of size bigger than your RAM capacity. 此外，您应该使用Python的生成器，以便可以对大小大于RAM容量的数据进行迭代。

Last but not least, each example is usually trained once per epoch. 最后但并非最不重要的一点是，每个示例通常每个时期训练一次。 Different approaches (say oversampling or undersampling) are sometimes used but usually when your classes distribution is imbalanced (say 10% examples belong to class 0 and 90% to class 1`) or neural network has problems with specific class (though this one requires more well thought out approach). undersampling) are sometimes used but usually when your classes distribution is imbalanced (say 10% examples belong to class不同的方法（例如oversampling或undersampling) are sometimes used but usually when your classes distribution is imbalanced (say 10% examples belong to class 0， and 90% to class undersampling) are sometimes used but usually when your classes distribution is imbalanced (say 10% examples belong to class and 90% to class ）或神经网络存在特定类的问题时（尽管此方法需要更多的方法）经过深思熟虑的方法）。

The common practice is to train each batch with only 1 epoch. 通常的做法是只用1个时期训练每批。 Training on the same subset of data for 20 epochs can lead to overfitting which harms your model performance. 对20个时期的同一数据子集进行训练可能会导致过度拟合，从而损害模型性能。

To understand better how epochs number trained on each batch affect your performance you can do a grid search and compare the results. 为了更好地了解每批训练的时期数如何影响您的表现，您可以进行网格搜索并比较结果。