简体   繁体   English

keras模型中的nan损失值

[英]nan values in loss in keras model

I have following data shapes 我有以下数据形状

X_Train.shape,Y_Train.shape
Out[52]: ((983, 19900), (983,))
X_Test.shape,Y_Test.shape
Out[53]: ((52, 19900), (52,))

I am running a simple binary classifier as Y_train and Y_test could be either 1 or 2 我正在运行一个简单的二进制分类器,因为Y_train和Y_test可以是1或2

import  keras
import  tensorflow as tf
from keras import  layers
from keras.layers import Input, Dense
from keras.models import Model,Sequential
import numpy as np
from  keras.optimizers import  Adam

myModel = keras.Sequential([
    keras.layers.Dense(1000,activation=tf.nn.relu,input_shape=(19900,)),
    keras.layers.Dense(64, activation=tf.nn.relu),
    keras.layers.Dense(32, activation=tf.nn.relu),
    keras.layers.Dense(1, activation=tf.nn.softmax)
])

myModel.compile(optimizer='adam', loss='sparse_categorical_crossentropy',metrics=['accuracy'])
myModel.fit(X_Train, Y_Train, epochs=100,batch_size=1000)
test_loss,test_acc=myModel.evaluate(X_Test,Y_Test)

Output of the Code 代码输出

Training Loss and Accuracy 训练损失和准确性

Epoch 1/100
983/983 [==============================] - 1s 1ms/step - loss: nan - acc: 0.4608
Epoch 2/100
983/983 [==============================] - 0s 206us/step - loss: nan - acc: 0.4873
Epoch 3/100
983/983 [==============================] - 0s 200us/step - loss: nan - acc: 0.4883
Epoch 4/100
983/983 [==============================] - 0s 197us/step - loss: nan - acc: 0.4883
Epoch 5/100
983/983 [==============================] - 0s 194us/step - loss: nan - acc: 0.4873
Epoch 6/100
983/983 [==============================] - 0s 202us/step - loss: nan - acc: 0.4863
Epoch 7/100
983/983 [==============================] - 0s 198us/step - loss: nan - acc: 0.4863
Epoch 8/100
983/983 [==============================] - 0s 194us/step - loss: nan - acc: 0.4883
Epoch 9/100
983/983 [==============================] - 0s 196us/step - loss: nan - acc: 0.4873
Epoch 10/100
983/983 [==============================] - 0s 198us/step - loss: nan - acc: 0.4873
Epoch 11/100
983/983 [==============================] - 0s 200us/step - loss: nan - acc: 0.4893
Epoch 12/100
983/983 [==============================] - 0s 198us/step - loss: nan - acc: 0.4873
Epoch 13/100
983/983 [==============================] - 0s 194us/step - loss: nan - acc: 0.4873
Epoch 14/100
983/983 [==============================] - 0s 197us/step - loss: nan - acc: 0.4883
Epoch 97/100
    983/983 [==============================] - 0s 196us/step - loss: nan - acc: 0.4893
Epoch 98/100
    983/983 [==============================] - 0s 199us/step - loss: nan - acc: 0.4883
Epoch 99/100
    983/983 [==============================] - 0s 193us/step - loss: nan - acc: 0.4883
Epoch 100/100
    983/983 [==============================] - 0s 196us/step - loss: nan - acc: 0.4863

Testing Loss and Accuracy 测试损失和准确性

test_loss,test_acc
Out[58]: (nan, 0.4615384661234342)

I also checked if there is any nan value in my data 我还检查了我的数据中是否有nan值

np.isnan(X_Train).any()
Out[5]: False
np.isnan(Y_Train).any()
Out[6]: False
np.isnan(X_Test).any()
Out[7]: False
np.isnan(Y_Test).any()
Out[8]: False

My Question is why my training accuracy is not improving and why loss is nan also why without one-hot encoding the softmax in the output is working fine? 我的问题是,为什么我的训练精度没有提高,为什么损耗不大?为什么没有一键编码,输出中的softmax也能正常工作?

Note1: I apologize that my data is big so I cannot share it here but if there are some way to share it here then I am ready to do that. 注意1:我很抱歉我的数据很大,所以我不能在这里共享它,但是如果有某种方法可以在这里共享它,那么我准备好了。

Note2 There are lot of zero values in my training data 注意2我的训练数据中有很多零值

Sometimes with Keras the combination of Relu and Softmax causes numerical troubles as Relu can produce large positive values corresponding to very small probabilities. 有时与Relu结合使用ReluSoftmax会造成数值麻烦,因为Relu可以产生与非常小的概率相对应的大正值。

Try to use tanh instead of Relu 尝试使用tanh代替Relu

If you are getting NaN values in loss, it means that input is outside of the function domain. 如果您丢失的NaN值,则表示输入在函数域之外。 There are multiple reasons why this could occur. 发生这种情况的原因有多种。 Here are few steps to track down the cause, 以下是找出原因的几个步骤,

1) If an input is outside of the function domain, then determine what those inputs are. 1)如果输入不在功能域内,则确定这些输入是什么。 Track the progression of input values to your cost function. 跟踪输入值到成本函数的进度。

2) Check if there are any null or nan values in input data set. 2)检查输入数据集中是否有任何null或nan值。 Can be accomplished by 可以通过完成

DataFrame.isnull().any() 

3) Change the scaling of input data. 3)更改输入数据的比例。 Normalizing the data between 0 and 1 the start the training. 将数据标准化为0和1之间的值即可开始训练。

4) Change method of weight initialization. 4)权重初始化的变更方法。

It's difficult to point to the exact solution with Deep Neural Networks. 使用深度神经网络很难指出确切的解决方案。 So try the above methods and it should give you a fair idea on what is going wrong. 因此,尝试上述方法,它应该使您对发生的问题有一个清晰的了解。

Softmax activation is not the right choice here. 在这里,Softmax激活不是正确的选择。 You have only one neuron on the output layer. 您在输出层上只有一个神经元。

Let's consider how softmax function is defined.(Image from wikepedia.org) 让我们考虑一下如何定义softmax函数。(图片来自wikepedia.org)

IMG .
Since there is only one neuron on the last layer, 由于最后一层只有一个神经元, 西格玛(z_i) will be 1 for all values of 对于的所有值将为1 z_i .

Since you are using sparse_categorical_crossentropy , keras(or tensorflow) can infer the number of the classes from the shape of the logits. 由于您正在使用sparse_categorical_crossentropy ,因此keras(或tensorflow)可以从logit的形状推断类的数量。 In keras(or tensorflow) the shape of logits is assumed to be [BATCH_SIZE, NUM_CLASSES] . 在keras(或tensorflow)中,logit的形状假定为[BATCH_SIZE,NUM_CLASSES] The shape your logits is [None, 1], so keras assumes that the number of your classes is 1, but your are feeding more than one classes( 0 or 1) and that is causing the error. 您的logit的形状为[None,1],因此keras假设您的班级数量为1,但是您正在喂食多个班级(0或1),这会导致错误。

The right activation function here is sigmoid(tanh also could work by altering the dataset target to be either -1 or 1). 此处的正确激活函数是Sigmoid(tanh也可以通过将数据集目标更改为-1或1来工作)。 The loss should be binary_crossentropy . 损失应该是binary_crossentropy

myModel = keras.Sequential([
    keras.layers.Dense(1000,activation=tf.nn.relu,input_shape=(19900,)),
    keras.layers.Dense(64, activation=tf.nn.relu),
    keras.layers.Dense(32, activation=tf.nn.relu),
    keras.layers.Dense(1, activation="sigmoid")
])

myModel.compile(optimizer='adam', loss='binary_crossentropy',metrics=['accuracy'])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM