简体   繁体   English

2层神经网络不收敛

[英]2 Layer Neural Network Does not Converge

Background 背景

I am a newbie to TensorFlow and I am trying to understand the basics of deep learning. 我是TensorFlow的新手,我试图了解深度学习的基础知识。 I started from writing a two-layer neural network from scratch and it achieved 89% accuracy on MNIST dataset and now I am trying to implement the same network in TensorFlow and compare their performance. 我从头开始编写了两层神经网络,该网络在MNIST数据集上达到了89%的准确度,现在我试图在TensorFlow中实现相同的网络并比较它们的性能。

Problem 问题

I am not sure if I miss something basic in the code, but the following implementation seems to be unable to update weights and therefore could not output anything meaningful. 我不确定我是否会错过代码中的一些基本知识,但是以下实现似乎无法更新权重,因此无法输出任何有意义的信息。

num_hidden = 100
# x -> (batch_size, 784)
x = tf.placeholder(tf.float32, [None, 784])

W1 = tf.Variable(tf.zeros((784, num_hidden)))
b1 = tf.Variable(tf.zeros((1, num_hidden)))
W2 = tf.Variable(tf.zeros((num_hidden, 10)))
b2 = tf.Variable(tf.zeros((1, 10)))
# z -> (batch_size, num_hidden)
z = tf.nn.relu(tf.matmul(x, W1) + b1)
# y -> (batch_size, 10)
y = tf.nn.softmax(tf.matmul(z, W2) + b2)

# y_ -> (batch_size, 10)
y_ =  tf.placeholder(tf.float32, [None, 10])
# y_ * tf.log(y) -> (batch_size, 10)
cross_entropy =  -tf.reduce_sum(y_ * tf.log(y+1e-10))
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(cross_entropy)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
# tf.argmax(y, axis=1) returns the maximum index in each row
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
for epoch in range(1000):
    # batch_xs -> (100, 784)
    # batch_ys -> (100, 10), one-hot encoded
    batch_xs, batch_ys = mnist.train.next_batch(100)
    train_data = {x: batch_xs, y_: batch_ys}
    sess.run(train_step, feed_dict=train_data)
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
W1_e, b1_e, W2_e, b2_e = W1.eval(), b1.eval(), W2.eval(), b2.eval()
sess.close()

What I Have Done 我做了什么

I checked many the official docs and many other implementations, but I feel totally confused since they may use different versions and API varies greatly. 我检查了许多官方文档和许多其他实现,但是由于它们可能使用不同的版本并且API差别很大,我感到完全困惑。

So could someone help me, thank you in advance. 有人可以帮我吗,谢谢。

There are two problems with what you have done so far. 到目前为止,您所做的工作有两个问题。 First, you have initialised all of the weights to zero, which will prevent the network from learning. 首先,您已将所有权重初始化为零,这将阻止网络学习。 And secondly, the learning rate was too high. 其次,学习率太高。 The below code got me 0.9665 accuracy. 以下代码使我的精度为0.9665。 For why not to set all the weights to zero you can see here . 为什么不将所有权重设置为零,您可以在这里看到。

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)


num_hidden = 100

# x -> (batch_size, 784)
x = tf.placeholder(tf.float32, [None, 784])
label_place = tf.placeholder(tf.float32, [None, 10])


# WONT WORK as EVERYTHING IS ZERO!
# # Get accuracy at chance \approx 0.1
# W1 = tf.Variable(tf.zeros((784, num_hidden)))
# b1 = tf.Variable(tf.zeros((1, num_hidden)))
# W2 = tf.Variable(tf.zeros((num_hidden, 10)))
# b2 = tf.Variable(tf.zeros((1, 10)))

# Will work, you will need to train a bit more than 1000 steps
# though
W1 = tf.Variable(tf.random_normal((784, num_hidden), 0., 0.1))
b1 = tf.Variable(tf.zeros((1, num_hidden)))
W2 = tf.Variable(tf.random_normal((num_hidden, 10), 0, 0.1))
b2 = tf.Variable(tf.zeros((1, 10)))

# network, we only go as far as the linear output after the hidden layer
# so we can feed it into the tf.nn.softmax_cross_entropy_with_logits below
# this is more numerically stable
z = tf.nn.relu(tf.matmul(x, W1) + b1)
logits = tf.matmul(z, W2) + b2

# define our loss etc as before. however note that the learning rate is lower as
# with a higher learning rate it wasnt really working
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(labels=label_place, logits=logits)
train_step = tf.train.GradientDescentOptimizer(.001).minimize(cross_entropy)

# continue as before
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
correct_prediction = tf.equal(tf.argmax(tf.nn.softmax(logits), 1), tf.argmax(label_place, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
for epoch in range(5000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    train_data = {x: batch_xs, label_place: batch_ys}
    sess.run(train_step, feed_dict=train_data)
print(sess.run(accuracy, feed_dict={x: mnist.test.images, label_place: mnist.test.labels}))
W1_e, b1_e, W2_e, b2_e = W1.eval(), b1.eval(), W2.eval(), b2.eval()
sess.close()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM