简体   繁体   中英

Linear Loss and Accuracy CNN graph

I recently ran my CNN under various batch sizes and noticed that the smaller the batch sizes(32, 64), the higher the accuracy but the graphs looked like this:

损失图

精度图

Can anyone explain why the graphs don't look normal? My training data has 4096 features. Here are my graphs for my larger batch sizes(512, 1024):

在此处输入图像描述

在此处输入图像描述

Ideally (according to classic gradient descent method) you should use one batch (the whole of your dataset). But it is too slow and your dataset might not fit into memory. So we use approximation of gradient (Stochastic gradient descent method) - by splitting dataset by batches (see here - https://en.wikipedia.org/wiki/Stochastic_gradient_descent ).

So the bigger batch - the better approximation is.

To see the difference you have to compare by the number of steps (not by epochs): the more batch size - the less steps per epoch. Now you've got accuracy of 19% in 55 epochs with big batches and in 50 epochs with small batches. Which is similar. But in the first case you've done 16 times more steps, which took much more time (up to 16 times).

Another important point - you can use higher learning rate with big batches, which can further improve training time. In your case - you can increase learning rate by the order of 4.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM