[英]nan values in loss in keras model
I have following data shapes 我有以下数据形状
X_Train.shape,Y_Train.shape
Out[52]: ((983, 19900), (983,))
X_Test.shape,Y_Test.shape
Out[53]: ((52, 19900), (52,))
I am running a simple binary classifier as Y_train and Y_test could be either 1 or 2 我正在运行一个简单的二进制分类器,因为Y_train和Y_test可以是1或2
import keras
import tensorflow as tf
from keras import layers
from keras.layers import Input, Dense
from keras.models import Model,Sequential
import numpy as np
from keras.optimizers import Adam
myModel = keras.Sequential([
keras.layers.Dense(1000,activation=tf.nn.relu,input_shape=(19900,)),
keras.layers.Dense(64, activation=tf.nn.relu),
keras.layers.Dense(32, activation=tf.nn.relu),
keras.layers.Dense(1, activation=tf.nn.softmax)
])
myModel.compile(optimizer='adam', loss='sparse_categorical_crossentropy',metrics=['accuracy'])
myModel.fit(X_Train, Y_Train, epochs=100,batch_size=1000)
test_loss,test_acc=myModel.evaluate(X_Test,Y_Test)
Output of the Code 代码输出
Training Loss and Accuracy 训练损失和准确性
Epoch 1/100
983/983 [==============================] - 1s 1ms/step - loss: nan - acc: 0.4608
Epoch 2/100
983/983 [==============================] - 0s 206us/step - loss: nan - acc: 0.4873
Epoch 3/100
983/983 [==============================] - 0s 200us/step - loss: nan - acc: 0.4883
Epoch 4/100
983/983 [==============================] - 0s 197us/step - loss: nan - acc: 0.4883
Epoch 5/100
983/983 [==============================] - 0s 194us/step - loss: nan - acc: 0.4873
Epoch 6/100
983/983 [==============================] - 0s 202us/step - loss: nan - acc: 0.4863
Epoch 7/100
983/983 [==============================] - 0s 198us/step - loss: nan - acc: 0.4863
Epoch 8/100
983/983 [==============================] - 0s 194us/step - loss: nan - acc: 0.4883
Epoch 9/100
983/983 [==============================] - 0s 196us/step - loss: nan - acc: 0.4873
Epoch 10/100
983/983 [==============================] - 0s 198us/step - loss: nan - acc: 0.4873
Epoch 11/100
983/983 [==============================] - 0s 200us/step - loss: nan - acc: 0.4893
Epoch 12/100
983/983 [==============================] - 0s 198us/step - loss: nan - acc: 0.4873
Epoch 13/100
983/983 [==============================] - 0s 194us/step - loss: nan - acc: 0.4873
Epoch 14/100
983/983 [==============================] - 0s 197us/step - loss: nan - acc: 0.4883
Epoch 97/100
983/983 [==============================] - 0s 196us/step - loss: nan - acc: 0.4893
Epoch 98/100
983/983 [==============================] - 0s 199us/step - loss: nan - acc: 0.4883
Epoch 99/100
983/983 [==============================] - 0s 193us/step - loss: nan - acc: 0.4883
Epoch 100/100
983/983 [==============================] - 0s 196us/step - loss: nan - acc: 0.4863
Testing Loss and Accuracy 测试损失和准确性
test_loss,test_acc
Out[58]: (nan, 0.4615384661234342)
I also checked if there is any nan value in my data 我还检查了我的数据中是否有nan值
np.isnan(X_Train).any()
Out[5]: False
np.isnan(Y_Train).any()
Out[6]: False
np.isnan(X_Test).any()
Out[7]: False
np.isnan(Y_Test).any()
Out[8]: False
My Question is why my training accuracy is not improving and why loss is nan also why without one-hot encoding the softmax in the output is working fine? 我的问题是,为什么我的训练精度没有提高,为什么损耗不大?为什么没有一键编码,输出中的softmax也能正常工作?
Note1: I apologize that my data is big so I cannot share it here but if there are some way to share it here then I am ready to do that. 注意1:我很抱歉我的数据很大,所以我不能在这里共享它,但是如果有某种方法可以在这里共享它,那么我准备好了。
Note2 There are lot of zero values in my training data 注意2我的训练数据中有很多零值
Sometimes with Keras the combination of Relu
and Softmax
causes numerical troubles as Relu
can produce large positive values corresponding to very small probabilities. 有时与
Relu
结合使用Relu
和Softmax
会造成数值麻烦,因为Relu
可以产生与非常小的概率相对应的大正值。
Try to use tanh
instead of Relu
尝试使用
tanh
代替Relu
If you are getting NaN values in loss, it means that input is outside of the function domain. 如果您丢失的NaN值,则表示输入在函数域之外。 There are multiple reasons why this could occur.
发生这种情况的原因有多种。 Here are few steps to track down the cause,
以下是找出原因的几个步骤,
1) If an input is outside of the function domain, then determine what those inputs are. 1)如果输入不在功能域内,则确定这些输入是什么。 Track the progression of input values to your cost function.
跟踪输入值到成本函数的进度。
2) Check if there are any null or nan values in input data set. 2)检查输入数据集中是否有任何null或nan值。 Can be accomplished by
可以通过完成
DataFrame.isnull().any()
3) Change the scaling of input data. 3)更改输入数据的比例。 Normalizing the data between 0 and 1 the start the training.
将数据标准化为0和1之间的值即可开始训练。
4) Change method of weight initialization. 4)权重初始化的变更方法。
It's difficult to point to the exact solution with Deep Neural Networks. 使用深度神经网络很难指出确切的解决方案。 So try the above methods and it should give you a fair idea on what is going wrong.
因此,尝试上述方法,它应该使您对发生的问题有一个清晰的了解。
Softmax activation is not the right choice here. 在这里,Softmax激活不是正确的选择。 You have only one neuron on the output layer.
您在输出层上只有一个神经元。
Let's consider how softmax function is defined.(Image from wikepedia.org) 让我们考虑一下如何定义softmax函数。(图片来自wikepedia.org)
.
。
Since there is only one neuron on the last layer, 由于最后一层只有一个神经元,
will be 1 for all values of
对于的所有值将为1
.
。
Since you are using sparse_categorical_crossentropy
, keras(or tensorflow) can infer the number of the classes from the shape of the logits. 由于您正在使用
sparse_categorical_crossentropy
,因此keras(或tensorflow)可以从logit的形状推断类的数量。 In keras(or tensorflow) the shape of logits is assumed to be [BATCH_SIZE, NUM_CLASSES] . 在keras(或tensorflow)中,logit的形状假定为[BATCH_SIZE,NUM_CLASSES] 。 The shape your logits is [None, 1], so keras assumes that the number of your classes is 1, but your are feeding more than one classes( 0 or 1) and that is causing the error.
您的logit的形状为[None,1],因此keras假设您的班级数量为1,但是您正在喂食多个班级(0或1),这会导致错误。
The right activation function here is sigmoid(tanh also could work by altering the dataset target to be either -1 or 1). 此处的正确激活函数是Sigmoid(tanh也可以通过将数据集目标更改为-1或1来工作)。 The loss should be
binary_crossentropy
. 损失应该是
binary_crossentropy
。
myModel = keras.Sequential([
keras.layers.Dense(1000,activation=tf.nn.relu,input_shape=(19900,)),
keras.layers.Dense(64, activation=tf.nn.relu),
keras.layers.Dense(32, activation=tf.nn.relu),
keras.layers.Dense(1, activation="sigmoid")
])
myModel.compile(optimizer='adam', loss='binary_crossentropy',metrics=['accuracy'])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.