I am using CNN (convolution neural network) model to train cifar10. I tried to change the batch size at each execution to see its impact on time
My conclusion was : the bigger the batch size the more time the model took to be executed.
Does this seem logical because at the end of every batch we apply back propagation algorithm, meaning with larger batch size we apply less gradient descent so logically we should have less execution time.
I found the opposite. What do you think guys ! Thanks
Here is my session code :
with tf.Session() as sess:
sess.run(init)
summary_writer =tf.summary.FileWriter(logs_path,graph=tf.get_default_graph())
start_time = time.time()
for i in range(iteration_number):
j = (i - epoch) * batch_size % number_of_examples
k= (i - epoch + 1) * batch_size % number_of_examples
if (k < j): # THE END OF DATA SET ------------------------
k = number_of_examples
batch_x = train_images[j:number_of_examples, :]
batch_y = train_labels[j:number_of_examples, :]
print("Iter " + str(i) + ", epoch Loss= " + \
"{:.6f}".format(loss) + ", Training Accuracy= " + \
"{:.5f}".format(acc))
data = numpy.concatenate((train_images, train_labels), axis=1)
numpy.random.shuffle(data)
train_images = data[:, :3072]
train_labels = data[:, 3072:3082]
epoch = i + 1
else:
batch_x = train_images[j:k, :]
batch_y = train_labels[j:k, :]
loss, acc, summary = sess.run([cost, accuracy, merged_summary_op], feed_dict={x: batch_x,
y: batch_y,
keep_prob: 0.3})
summary_writer.add_summary(summary)
sess.run(optimizer, feed_dict={x: batch_x, y: batch_y,
keep_prob: dropout})
The batch size basically indicates how often you want to adjust the weights for your neural network. A batch size of 1 would mean you give your NN 1 input and output pair, propagate the network with the inputs, calculate the error and adjust the weights. If you have a batch size the same as your datasets size, the NN will propagate all the input output pairs and add up the error and adjust the weights in the end. using a batch size that big usually gives you less accurate results but they are better suiting the average, you could say the outputs get kind of blurred out to avoid extremely big errors for some data and extremely small error for other data.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.