简体   繁体   中英

Tensorflow's loss function returns NAN after changing RNN to LSTM cell

I am training a model to predict Time Series using an RNN model. This model is trained without any issue. Here's the original code:

tf.reset_default_graph()

num_inputs = 1
num_neurons = 100
num_outputs = 1
learning_rate = 0.0001
num_train_iterations = 2000
batch_size = 1

X = tf.placeholder(tf.float32, [None, time_steps-1, num_inputs])
y = tf.placeholder(tf.float32, [None, time_steps-1, num_outputs])
cell = tf.contrib.rnn.OutputProjectionWrapper(
    tf.contrib.rnn.BasicRNNCell(num_units=num_neurons, activation=tf.nn.relu),
    output_size=num_outputs)
outputs, states = tf.nn.dynamic_rnn(cell, X, dtype=tf.float32)
loss = tf.reduce_mean(tf.square(outputs - y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
train = optimizer.minimize(loss)
init = tf.global_variables_initializer()
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.75)

with tf.Session(config=tf.ConfigProto(gpu_options=gpu_options)) as sess:
    sess.run(init)
for iteration in range(num_train_iterations):

    elx,ely = next_batch(training_data, time_steps)
    sess.run(train, feed_dict={X: elx, y: ely})

    if iteration % 100 == 0:

        mse = loss.eval(feed_dict={X: elx, y: ely})
        print(iteration, "\tMSE:", mse)

The problem comes when I change tf.contrib.rnn.BasicRNNCell to tf.contrib.rnn.BasicLSTMCell , there's a huge slowdown in speed and the loss function ( MSE variable becomes NAN ). My best bet is that MSE is the incorrect loss function and that I should try cross entropy. I searched for similar code and found that tf.nn.softmax_cross_entropy_with_logits() could be the solution but still don't understand how to implement it in my problem.

Usually the "NAN" occurs when your gradients blow up. Here is some code for tf.softmax. Have a try.

#Output Layer
logit = tf.add(tf.matmul(H1,w2),b2)
cross_entropy = 
tf.nn.softmax_cross_entropy_with_logits(logits=logit,labels=Y)

#Cost
cost = (tf.reduce_mean(cross_entropy))

#Optimizer
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)

#Prediction
y_pred = tf.nn.softmax(logit)

pred = tf.argmax(y_pred, axis=1 )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM