nan values in loss in keras model

Question

I have following data shapes

X_Train.shape,Y_Train.shape
Out[52]: ((983, 19900), (983,))
X_Test.shape,Y_Test.shape
Out[53]: ((52, 19900), (52,))

I am running a simple binary classifier as Y_train and Y_test could be either 1 or 2

import  keras
import  tensorflow as tf
from keras import  layers
from keras.layers import Input, Dense
from keras.models import Model,Sequential
import numpy as np
from  keras.optimizers import  Adam

myModel = keras.Sequential([
    keras.layers.Dense(1000,activation=tf.nn.relu,input_shape=(19900,)),
    keras.layers.Dense(64, activation=tf.nn.relu),
    keras.layers.Dense(32, activation=tf.nn.relu),
    keras.layers.Dense(1, activation=tf.nn.softmax)
])

myModel.compile(optimizer='adam', loss='sparse_categorical_crossentropy',metrics=['accuracy'])
myModel.fit(X_Train, Y_Train, epochs=100,batch_size=1000)
test_loss,test_acc=myModel.evaluate(X_Test,Y_Test)

Output of the Code

Training Loss and Accuracy

Epoch 1/100
983/983 [==============================] - 1s 1ms/step - loss: nan - acc: 0.4608
Epoch 2/100
983/983 [==============================] - 0s 206us/step - loss: nan - acc: 0.4873
Epoch 3/100
983/983 [==============================] - 0s 200us/step - loss: nan - acc: 0.4883
Epoch 4/100
983/983 [==============================] - 0s 197us/step - loss: nan - acc: 0.4883
Epoch 5/100
983/983 [==============================] - 0s 194us/step - loss: nan - acc: 0.4873
Epoch 6/100
983/983 [==============================] - 0s 202us/step - loss: nan - acc: 0.4863
Epoch 7/100
983/983 [==============================] - 0s 198us/step - loss: nan - acc: 0.4863
Epoch 8/100
983/983 [==============================] - 0s 194us/step - loss: nan - acc: 0.4883
Epoch 9/100
983/983 [==============================] - 0s 196us/step - loss: nan - acc: 0.4873
Epoch 10/100
983/983 [==============================] - 0s 198us/step - loss: nan - acc: 0.4873
Epoch 11/100
983/983 [==============================] - 0s 200us/step - loss: nan - acc: 0.4893
Epoch 12/100
983/983 [==============================] - 0s 198us/step - loss: nan - acc: 0.4873
Epoch 13/100
983/983 [==============================] - 0s 194us/step - loss: nan - acc: 0.4873
Epoch 14/100
983/983 [==============================] - 0s 197us/step - loss: nan - acc: 0.4883
Epoch 97/100
    983/983 [==============================] - 0s 196us/step - loss: nan - acc: 0.4893
Epoch 98/100
    983/983 [==============================] - 0s 199us/step - loss: nan - acc: 0.4883
Epoch 99/100
    983/983 [==============================] - 0s 193us/step - loss: nan - acc: 0.4883
Epoch 100/100
    983/983 [==============================] - 0s 196us/step - loss: nan - acc: 0.4863

Testing Loss and Accuracy

test_loss,test_acc
Out[58]: (nan, 0.4615384661234342)

I also checked if there is any nan value in my data

np.isnan(X_Train).any()
Out[5]: False
np.isnan(Y_Train).any()
Out[6]: False
np.isnan(X_Test).any()
Out[7]: False
np.isnan(Y_Test).any()
Out[8]: False

My Question is why my training accuracy is not improving and why loss is nan also why without one-hot encoding the softmax in the output is working fine?

Note1: I apologize that my data is big so I cannot share it here but if there are some way to share it here then I am ready to do that.

Note2 There are lot of zero values in my training data

Answer 1

Sometimes with Keras the combination of Relu and Softmax causes numerical troubles as Relu can produce large positive values corresponding to very small probabilities.

Try to use tanh instead of Relu

Answer 2

If you are getting NaN values in loss, it means that input is outside of the function domain. There are multiple reasons why this could occur. Here are few steps to track down the cause,

1) If an input is outside of the function domain, then determine what those inputs are. Track the progression of input values to your cost function.

2) Check if there are any null or nan values in input data set. Can be accomplished by

DataFrame.isnull().any()

3) Change the scaling of input data. Normalizing the data between 0 and 1 the start the training.

4) Change method of weight initialization.

It's difficult to point to the exact solution with Deep Neural Networks. So try the above methods and it should give you a fair idea on what is going wrong.

Answer 3

Softmax activation is not the right choice here. You have only one neuron on the output layer.

Let's consider how softmax function is defined.(Image from wikepedia.org)

.
Since there is only one neuron on the last layer, 西格玛（z_i） will be 1 for all values of z_i .

Since you are using sparse_categorical_crossentropy , keras(or tensorflow) can infer the number of the classes from the shape of the logits. In keras(or tensorflow) the shape of logits is assumed to be [BATCH_SIZE, NUM_CLASSES] . The shape your logits is [None, 1], so keras assumes that the number of your classes is 1, but your are feeding more than one classes( 0 or 1) and that is causing the error.

The right activation function here is sigmoid(tanh also could work by altering the dataset target to be either -1 or 1). The loss should be binary_crossentropy .

myModel = keras.Sequential([
    keras.layers.Dense(1000,activation=tf.nn.relu,input_shape=(19900,)),
    keras.layers.Dense(64, activation=tf.nn.relu),
    keras.layers.Dense(32, activation=tf.nn.relu),
    keras.layers.Dense(1, activation="sigmoid")
])

myModel.compile(optimizer='adam', loss='binary_crossentropy',metrics=['accuracy'])

nan values in loss in keras model

Question

3 answers

solution1
2 2019-05-20 10:03:48

solution2
1 2019-05-20 10:02:22

solution3
1 2019-05-20 13:12:50

nan values in loss in keras model

Question

3 answers

solution1 2 2019-05-20 10:03:48

solution2 1 2019-05-20 10:02:22

solution3 1 2019-05-20 13:12:50

solution1
2 2019-05-20 10:03:48

solution2
1 2019-05-20 10:02:22

solution3
1 2019-05-20 13:12:50