简体   繁体   English

线性损失和准确度 CNN 图

[英]Linear Loss and Accuracy CNN graph

I recently ran my CNN under various batch sizes and noticed that the smaller the batch sizes(32, 64), the higher the accuracy but the graphs looked like this:我最近在不同的批量大小下运行我的 CNN,并注意到批量大小(32、64)越小,准确度越高,但图表看起来像这样:

损失图

精度图

Can anyone explain why the graphs don't look normal?谁能解释为什么这些图表看起来不正常? My training data has 4096 features.我的训练数据有 4096 个特征。 Here are my graphs for my larger batch sizes(512, 1024):这是我的较大批量大小(512、1024)的图表:

在此处输入图像描述

在此处输入图像描述

Ideally (according to classic gradient descent method) you should use one batch (the whole of your dataset).理想情况下(根据经典梯度下降法),您应该使用一批(整个数据集)。 But it is too slow and your dataset might not fit into memory.但它太慢了,您的数据集可能不适合 memory。 So we use approximation of gradient (Stochastic gradient descent method) - by splitting dataset by batches (see here - https://en.wikipedia.org/wiki/Stochastic_gradient_descent ).所以我们使用梯度近似(随机梯度下降法)——通过分批分割数据集(见这里——https://en.wikipedia.org/wiki/Stochastic_gradient_descent )。

So the bigger batch - the better approximation is.所以更大的批次 - 更好的近似值。

To see the difference you have to compare by the number of steps (not by epochs): the more batch size - the less steps per epoch.要查看差异,您必须通过步数(而不是时期)进行比较:批量大小越大 - 每个时期的步数越少。 Now you've got accuracy of 19% in 55 epochs with big batches and in 50 epochs with small batches.现在你在 55 个大批量的 epoch 和 50 个小批量的 epoch 中获得了 19% 的准确率。 Which is similar.这是相似的。 But in the first case you've done 16 times more steps, which took much more time (up to 16 times).但是在第一种情况下,您已经完成了 16 次以上的步骤,这花费了更多时间(最多 16 次)。

Another important point - you can use higher learning rate with big batches, which can further improve training time.另一个重要的一点——你可以对大批量使用更高的学习率,这可以进一步缩短训练时间。 In your case - you can increase learning rate by the order of 4.就您而言 - 您可以将学习率提高 4 倍。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM