为什么这个神经网络什么都不学？

Question

我正在学习TensorFlow，并正在实现一个简单的神经网络，如TensorFlow文档中针对初学者的MNIST中所述。 这是链接。 如预期的那样，精度约为80-90％。

然后紧接着同一篇文章的是使用ConvNet的MNIST for Experts。 我没有实施，而是决定改进初学者的部分。 我了解神经网络及其学习方式，以及深层网络的性能优于浅层网络的事实。 我为初学者修改了MNIST中的原始程序，以实现一个神经网络，该网络具有16个神经元的2个隐藏层。

看起来像这样：

网络形象

为其编码

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

x = tf.placeholder(tf.float32, [None, 784], 'images')
y = tf.placeholder(tf.float32, [None, 10], 'labels')

# We are going to make 2 hidden layer neurons with 16 neurons each

# All the weights in network
W0 = tf.Variable(dtype=tf.float32, name='InputLayerWeights', initial_value=tf.zeros([784, 16]))
W1 = tf.Variable(dtype=tf.float32, name='HiddenLayer1Weights', initial_value=tf.zeros([16, 16]))
W2 = tf.Variable(dtype=tf.float32, name='HiddenLayer2Weights', initial_value=tf.zeros([16, 10]))

# All the biases for the network
B0 = tf.Variable(dtype=tf.float32, name='HiddenLayer1Biases', initial_value=tf.zeros([16]))
B1 = tf.Variable(dtype=tf.float32, name='HiddenLayer2Biases', initial_value=tf.zeros([16]))
B2 = tf.Variable(dtype=tf.float32, name='OutputLayerBiases', initial_value=tf.zeros([10]))


def build_graph():
    """This functions wires up all the biases and weights of the network
    and returns the last layer connections
    :return: returns the activation in last layer of network/output layer without softmax
    """
    A1 = tf.nn.relu(tf.matmul(x, W0) + B0)
    A2 = tf.nn.relu(tf.matmul(A1, W1) + B1)
    return tf.matmul(A2, W2) + B2


def print_accuracy(sx, sy, tf_session):
    """This function prints the accuracy of a model at the time of invocation
    :return: None
    """
    correct_prediction = tf.equal(tf.argmax(y), tf.argmax(tf.nn.softmax(build_graph())))
    correct_prediction_float = tf.cast(correct_prediction, dtype=tf.float32)
    accuracy = tf.reduce_mean(correct_prediction_float)

    print(accuracy.eval(feed_dict={x: sx, y: sy}, session=tf_session))


y_predicted = build_graph()

cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=y_predicted))

model = tf.train.GradientDescentOptimizer(0.03).minimize(cross_entropy)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for _ in range(1000):
        batch_x, batch_y = mnist.train.next_batch(50)
        if _ % 100 == 0:
            print_accuracy(batch_x, batch_y, sess)
        sess.run(model, feed_dict={x: batch_x, y: batch_y})

预期的预期输出要好于仅单层时的预期输出（假设W0的形状为[784,10]，B0的形状为[10]）

def build_graph():
     return tf.matmul(x,W0) + B0

取而代之的是，输出结果表明网络根本没有训练。 精度在任何迭代中均不超过20％。

输出量

提取MNIST_data / train-images-idx3-ubyte.gz

提取MNIST_data / train-labels-idx1-ubyte.gz

提取MNIST_data / t10k-images-idx3-ubyte.gz

提取MNIST_data / t10k-labels-idx1-ubyte.gz

0.1

0.1

0.1

0.1

0.1

0.1

0.1

0.1

0.1

0.1

我的问题

上面的程序根本无法概括出什么问题？ 如何在不使用卷积神经网络的情况下进一步改进它？

Answer 1

你的主要错误是网络对称的，因为你初始化所有的权重为零。 结果，权重永远不会更新。 将其更改为较小的随机数，它将开始学习。 可以使用零初始化偏差。

另一个问题是纯粹的技术问题： print_accuracy函数正在计算图中创建新节点，并且由于您是在循环中调用它，因此该图变得肿，最终将耗尽所有内存。

您可能还想使用超参数并使网络更大以增加其容量。

编辑：我也发现您的准确性计算中的错误。 它应该是

correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_predicted, 1))

这是完整的代码：

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

x = tf.placeholder(tf.float32, [None, 784], 'images')
y = tf.placeholder(tf.float32, [None, 10], 'labels')

W0 = tf.Variable(dtype=tf.float32, name='InputLayerWeights', initial_value=tf.truncated_normal([784, 16]) * 0.001)
W1 = tf.Variable(dtype=tf.float32, name='HiddenLayer1Weights', initial_value=tf.truncated_normal([16, 16]) * 0.001)
W2 = tf.Variable(dtype=tf.float32, name='HiddenLayer2Weights', initial_value=tf.truncated_normal([16, 10]) * 0.001)

B0 = tf.Variable(dtype=tf.float32, name='HiddenLayer1Biases', initial_value=tf.ones([16]))
B1 = tf.Variable(dtype=tf.float32, name='HiddenLayer2Biases', initial_value=tf.ones([16]))
B2 = tf.Variable(dtype=tf.float32, name='OutputLayerBiases', initial_value=tf.ones([10]))

A1 = tf.nn.relu(tf.matmul(x, W0) + B0)
A2 = tf.nn.relu(tf.matmul(A1, W1) + B1)
y_predicted = tf.matmul(A2, W2) + B2
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_predicted, 1))
correct_prediction_float = tf.cast(correct_prediction, dtype=tf.float32)
accuracy = tf.reduce_mean(correct_prediction_float)
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=y_predicted))
optimizer = tf.train.AdamOptimizer(0.001).minimize(cross_entropy)

mnist = input_data.read_data_sets('mnist', one_hot=True)
with tf.Session() as sess:
  sess.run(tf.global_variables_initializer())
  for i in range(20000):
    batch_x, batch_y = mnist.train.next_batch(64)
    _, cost_val, acc_val = sess.run([optimizer, cross_entropy, accuracy], feed_dict={x: batch_x, y: batch_y})
    if i % 100 == 0:
      print('cost=%.3f accuracy=%.3f' % (cost_val, acc_val))

为什么这个神经网络什么都不学？

问题描述

网络形象

为其编码

输出量

我的问题

1 个解决方案

解决方案1
5 已采纳 2017-12-30 09:21:41

为什么这个神经网络什么都不学？

问题描述

网络形象

为其编码

输出量

我的问题

1 个解决方案

解决方案1 5 已采纳 2017-12-30 09:21:41

解决方案1
5 已采纳 2017-12-30 09:21:41