简体   繁体   中英

Training a fully connected network with one hidden layer on MNIST in Tensorflow

I have just gotten into Machine Learning with Tensorflow and after finishing the MNIST beginners tutorial I wanted to improve the accuracy of that simple model a bit by inserting a hidden layer. Essentially, I then decided to directly copy the network architecture from the first chapter of Micheal Nielsen's book on neural networks and deep learning (see here ).

Nielsen's code works fine for me, however, I didn't get comparable results using the following Tensorflow code. It should - if I am not mistaken - implement exactly the model Nielsen proposed:

from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf

mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)


def weight_variable(shape):
    initial = tf.random_normal(shape)
    return tf.Variable(initial)

def bias_variable(shape):
    initial = tf.random_normal(shape)
    return tf.Variable(initial)


x = tf.placeholder(tf.float32, [None, 784])

#hidden layer
W_fc1 = weight_variable([784, 30])
b_fc1 = bias_variable([30])
h_fc1 = tf.sigmoid(tf.matmul(x, W_fc1) + b_fc1)

#output layer
W_fc2 = weight_variable([30, 10])
b_fc2 = bias_variable([10])
y = tf.sigmoid(tf.matmul(h_fc1, W_fc2) + b_fc2)

y_ = tf.placeholder(tf.float32, [None, 10])
loss = tf.reduce_mean(tf.reduce_sum(tf.pow(y_ - y, 2), reduction_indices=[1])) #I also tried simply tf.nn.l2_loss(y_ - y)
train_step = tf.train.GradientDescentOptimizer(3.0).minimize(loss)

sess = tf.InteractiveSession()
tf.global_variables_initializer().run()

def get_accuracy():
    correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    return sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels})

for i in range(30):
    batch_xs, batch_ys = mnist.train.next_batch(10)
    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
    print("Epoch {} accuracy: {:.2f}%".format(i+1, get_accuracy() * 100))

I get an accuracy of about 17% after training for 30 epochs. Using Nielsen's code, I get an accuracy of 91% after only one epoch of training.

Obviously I am missing something. I have tried to improve the accuracy and managed to get it up to about 60% by training longer, but the same network should give similar results, even though it might use different backend code. I also tried playing around with the hyper-parameters but didn't achieve any comparable results.

Do you find any flaw in my code?

As mentioned by suharshs, it looks like your problem is caused by a misunderstanding of the term epoch. Although not rigidly the case, an epoch is usually a single iteration over the entire training dataset. If you take another look at Nielsen's code, you'll see this reflected in the SGD method. A single epoch involves iterating through the entire training_data, which is divided into mini batches. Each of your epochs is actually the size of a mini batch, only 10 samples.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM