简体   繁体   中英

2 layer NN weights not updating

I have a fairly simple NN that has 1 hidden layer.

However, the weights don't seem to be updating. Or perhaps they are but the variable values don't change ?

Either way, my accuracy is 0.1 and it doesn't change no matter I change the learning rate or the activation function. Not sure what is wrong. Any ideas ?

I've posted the entire code correctly formatter so you guys can directly copy paste it and run it on your local machines.

from tensorflow.examples.tutorials.mnist import input_data
import math
import numpy as np
import tensorflow as tf

# one hot option returns binarized labels. mnist = input_data.read_data_sets('MNIST_data/', one_hot=True)   
# model parameters 
x = tf.placeholder(tf.float32, [784, None],name='x')
# weights 
W1 = tf.Variable(tf.truncated_normal([25, 784],stddev= 1.0/math.sqrt(784)),name='W') 
W2 = tf.Variable(tf.truncated_normal([25, 25],stddev=1.0/math.sqrt(25)),name='W')  
W3 = tf.Variable(tf.truncated_normal([10, 25],stddev=1.0/math.sqrt(25)),name='W') 

# bias units b1 = tf.Variable(tf.zeros([25,1]),name='b1') 
b2 = tf.Variable(tf.zeros([25,1]),name='b2') 
b3 = tf.Variable(tf.zeros([10,1]),name='b3')

# NN architecture 
hidden1 = tf.nn.relu(tf.matmul(W1, x,name='hidden1')+b1, name='hidden1_out')

# hidden2 = tf.nn.sigmoid(tf.matmul(W2, hidden1, name='hidden2')+b2, name='hidden2_out')

y = tf.matmul(W3, hidden1,name='y') + b3

y_ = tf.placeholder(tf.float32, [10, None],name='y_')

# Create the model   
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y, y_)) 
train_step = tf.train.GradientDescentOptimizer(2).minimize(cross_entropy)  

sess = tf.Session()   
summary_writer = tf.train.SummaryWriter('log_simple_graph', sess.graph)   
init = tf.global_variables_initializer()   
sess.run(init)   
# Train 
for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    summary =sess.run(train_step, feed_dict={x: np.transpose(batch_xs), y_: np.transpose(batch_ys)})
    if summary is not None:
        summary_writer.add_event(summary)

# Test trained model 
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1)) 
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

print(sess.run(accuracy, feed_dict={x: np.transpose(mnist.test.images),  y_: np.transpose(mnist.test.labels)}))

The reason why you are getting 0.1 accuracy consistently . Learning rate is another factor. If the learning rate is very high, the gradient would be oscillating and will not reach any minima.

Tensorflow takes the number of instances(batches) as the first index value of placeholder. So the code which declares input x

x = tf.placeholder(tf.float32, [784, None],name='x')

should be declared as

x = tf.placeholder(tf.float32, [None, 784],name='x')

Consequently, W1 should be declared as

W1 = tf.Variable(tf.truncated_normal([784, 25],stddev= 1.0/math.sqrt(784)),name='W')

and so on.. Even the bias variables should be declared in the transpose sense. (Thats how tensorflow takes it :) )

For example

b1 = tf.Variable(tf.zeros([25]),name='b1') 
b2 = tf.Variable(tf.zeros([25]),name='b2') 
b3 = tf.Variable(tf.zeros([10]),name='b3')

I'm putting the corrected full code below for your reference. I achieved an accuracy of 0.9262 with this :D

from tensorflow.examples.tutorials.mnist import input_data
import math
import numpy as np
import tensorflow as tf

# one hot option returns binarized labels. 
mnist = input_data.read_data_sets('MNIST_data/', one_hot=True)   
# model parameters 
x = tf.placeholder(tf.float32, [None, 784],name='x')
# weights 
W1 = tf.Variable(tf.truncated_normal([784, 25],stddev= 1.0/math.sqrt(784)),name='W') 
W2 = tf.Variable(tf.truncated_normal([25, 25],stddev=1.0/math.sqrt(25)),name='W')  
W3 = tf.Variable(tf.truncated_normal([25, 10],stddev=1.0/math.sqrt(25)),name='W') 

# bias units 
b1 = tf.Variable(tf.zeros([25]),name='b1') 
b2 = tf.Variable(tf.zeros([25]),name='b2') 
b3 = tf.Variable(tf.zeros([10]),name='b3')

# NN architecture 
hidden1 = tf.nn.relu(tf.matmul(x, W1,name='hidden1')+b1, name='hidden1_out')

# hidden2 = tf.nn.sigmoid(tf.matmul(W2, hidden1, name='hidden2')+b2, name='hidden2_out')

y = tf.matmul(hidden1, W3,name='y') + b3

y_ = tf.placeholder(tf.float32, [None, 10],name='y_')

# Create the model   
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y, y_)) 
train_step = tf.train.GradientDescentOptimizer(0.1).minimize(cross_entropy)  

sess = tf.Session()   
summary_writer = tf.train.SummaryWriter('log_simple_graph', sess.graph)   
init = tf.initialize_all_variables()   
sess.run(init)

for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    summary =sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
    if summary is not None:
        summary_writer.add_event(summary)

# Test trained model 
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1)) 
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

print(sess.run(accuracy, feed_dict={x: mnist.test.images,  y_: mnist.test.labels}))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM