在TensorFlow中的多层网络上ReluGrad输入不是无限的

Question

I'm doing the udacity course of TensotFlow, I'm trying to train a neural network on the notMNIST set. 我正在做TensotFlow的大胆课程，我正在尝试在notMNIST集合上训练神经网络。
When using a 1-hidden layer network all works fine, but when I try to add another layer, after ~150 steps I get this error: 使用1隐藏层的网络时，一切正常，但是当我尝试添加另一层时，经过约150个步骤后，出现此错误：

InvalidArgumentError: ReluGrad input is not finite. : Tensor had NaN values

This is the network model: 这是网络模型：

def model(x, w_h,w_h2,w_0,b_h,b_h2,b_0,p_drop):
h = tf.nn.relu(tf.matmul(x,w_h)+b_h)
h = tf.nn.dropout(h,p_drop)
h2 = tf.nn.relu(tf.matmul(h, w_h2)+b_h2)
h2 = tf.nn.dropout(h2,p_drop)
return (tf.matmul(h2,w_0)+b_0)

And the error is pointing at a specific line: 错误指向特定的行：

h = tf.nn.relu(tf.matmul(x,w_h)+b_h)

I guess the with two-layer network the w_h are becoming very small so the matmul product go to zero, but I don't understand how I can solve it Notice that I'm using this optimizer: 我猜两层网络的w_h变得非常小，因此matmul乘积变为零，但是我不知道如何解决它。请注意，我正在使用此优化器：

net = model(tf_train_dataset,w_h,w_h2,w_0,b_h,b_h2,b_0,0.5)
loss = tf.reduce_mean(
       tf.nn.softmax_cross_entropy_with_logits(net, tf_train_labels))
global_step = tf.Variable(0)  # count the number of steps taken.
learning_rate = tf.train.exponential_decay(0.5, global_step, 100, 0.95)
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss,global_step=global_step)

The net is 784->1024->512->10 网络是784-> 1024-> 512-> 10

Any help would be appreciated... 任何帮助，将不胜感激...

Answer 1

I was having the same problem when my weights were being initialized randomly and by biases with zeros. 当我的权重被随机初始化并且被零偏时，我遇到了同样的问题。 Using Xavier and Yoshua 's initialization solved the problem, and here is my full example: 使用Xavier和Yoshua的初始化解决了这个问题，这是我的完整示例：

hidden_size = 1024
batch_size = 256

def multilayer(x, w, b):
    for i, (wi, bi) in enumerate(zip(w, b)):
        if i == 0:
            out = tf.nn.relu(tf.matmul(x, wi) + bi)
        elif i == len(w) - 1:
            out = tf.matmul(out, wi) + bi
        else:
            out = tf.nn.relu(tf.matmul(out, wi) + bi)
    print(out.shape, x.shape)
    return out

graph = tf.Graph()
with graph.as_default():

    # Input data. For the training data, we use a placeholder that will be fed
    # at run time with a training minibatch.
    tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size * image_size))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)

    # Defining Xavier and Yoshua's initializer
    initializer = tf.contrib.layers.xavier_initializer()

    # Variables
    W1 = tf.Variable(initializer([image_size * image_size, hidden_size]))
    b1 = tf.Variable(initializer([hidden_size]))
    W2 = tf.Variable(initializer([hidden_size, hidden_size]))
    b2 = tf.Variable(initializer([hidden_size]))
    W3 = tf.Variable(initializer([hidden_size, hidden_size]))
    b3 = tf.Variable(initializer([hidden_size]))
    W4 = tf.Variable(initializer([hidden_size, hidden_size]))
    b4 = tf.Variable(initializer([hidden_size]))
    W5 = tf.Variable(initializer([hidden_size, num_labels]))
    b5 = tf.Variable(initializer([num_labels]))

    Ws = [W1, W2, W3, W4, W5]
    bs = [b1, b2, b3, b4, b5]

    # Training computation
    logits = multilayer(tf_train_dataset, Ws, bs)
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=tf_train_labels))
    #NOTE loss is actually a scalar value that represents the effectiveness of the
    #     current prediction. A minimized loss means that the weights and biases
    #     are adjusted at their best for the training data.

    # Optimizer
    optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)

    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(logits)
    valid_prediction = tf.nn.softmax(multilayer(tf_valid_dataset, Ws, bs))
    test_prediction = tf.nn.softmax(multilayer(tf_test_dataset, Ws, bs))

在TensorFlow中的多层网络上ReluGrad输入不是无限的

问题描述

1 个解决方案

解决方案1
0 2017-07-29 11:17:32

在TensorFlow中的多层网络上ReluGrad输入不是无限的

问题描述

1 个解决方案

解决方案1 0 2017-07-29 11:17:32

解决方案1
0 2017-07-29 11:17:32