简体   繁体   中英

How to choose batch_size, steps_per_epoch and epoch with Keras generator

I'm training 2 different CNN (custom and transfer learning) for an image classification problem. I use the same generator for both models. The dataset contains 5000 samples for 5 classes, but is imbalanced.

Here's the custom model I'm using.

def __init__(self, transfer_learning = False, lambda_reg = 0.001, drop_out_rate = 0.1):
    if(transfer_learning == False):
        self.model = Sequential();
        self.model.add(Conv2D(32, (3,3), input_shape = (224,224,3), activation = "relu"))
        self.model.add(MaxPooling2D(pool_size = (2,2)))

        self.model.add(Conv2D(64, (1,1), activation = "relu"))
        self.model.add(MaxPooling2D(pool_size = (2,2)))

        self.model.add(Conv2D(128, (3,3), activation = "relu"))
        self.model.add(MaxPooling2D(pool_size = (2,2)))

        self.model.add(Conv2D(128, (1,1), activation = "relu"))
        self.model.add(MaxPooling2D(pool_size = (2,2)))

        self.model.add(Flatten())

        self.model.add(Dense(512))
        self.model.add(Dropout(drop_out_rate))
        self.model.add(Dense(256))
        self.model.add(Dropout(drop_out_rate))

        self.model.add(Dense(5, activation = "softmax"))

So I can't understand the relation between steps_per_epoch and batch_size . batch_size is the number of samples the generator sends. But is steps_per_epoch the number of batch_size to complete one training epoch? If so, then it should be: steps_per_epoch = total_samples/batch_size ?

Whatever value I try, I always get the same problem (on both models), the val_acc seems to reach a local optima.

First of all, steps_per_epoch = total_samples/batch_size is correct in general terms.
It's an example code written by tensowflow as following:

for epoch in range(training_epochs):
    avg_cost = 0
    total_batch = int(mnist.train.num_examples / batch_size)

    for i in range(total_batch):
        batch_xs, batch_ys = mnist.train.next_batch(batch_size)
        feed_dict = {X: batch_xs, Y: batch_ys}
        c, _ = sess.run([cost, optimizer], feed_dict=feed_dict)
        avg_cost += c / total_batch

    print('Epoch:', '%04d' % (epoch + 1), 'cost =', '{:.9f}'.format(avg_cost))

print('Learning Finished!')

By the way, Although It is not exactly related with your question. There are some various optimizer such as Stochastic Gradient Descent and Adam because that a learning takes too long time with heavy data set.
It does not learn all data every time. There are many articles about that. Here I just leave one of them.

And, For your val_acc , It seems that Your model has so many Convolution layer.
You reduced filters and maxpooling of convolution layers, But, I think it is too much. How is going on? Is it better than before?

You are mixing two issues here. One is how to determine batch_size vs steps_per_epoch; the other one is why val_acc seems to reach a local optima and won't continue improving.

(1) For the issue -- batch_size vs steps_per_epoch

The strategy should be first to maximize batch_size as large as the memory permits, especially when you are using GPU (4~11GB). Normally batch_size=32 or 64 should be fine, but in some cases, you'd have to reduce to 8, 4, or even 1. The training code will throw exceptions if there is not enough memory to allocate, so you know when to stop increasing the batch_size.

Once batch_size is set, steps_per_epoch can be calculated by Math.ceil(total_samples/batch_size). but sometimes, you may want to set it a few times larger when data augmentation is used.

(2) The second issue -- val_acc reaches local optima, won't continue improving

It is the crux of the matter for deep learning, isn't it? It makes DL both exciting and difficult at the same time. The batch_size, steps_per_epoch and number of epochs won't help much here. It is the model and the hyperparameters (such as learning rate, loss function, optimization function, etc.) that controls how the model performs.

A few easy tips are to try different learning rates, different optimization functions. If you find the model is overfitting (val_acc going down with more epochs), increasing the sample size always helps if it is possible. Data augmentation helps to some degree.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM