使用张量流训练神经网络一段时间后生成Nans

Question

I am facing this problem since a few days. 几天以来，我一直在面对这个问题。 I don't know where I am making a mistake. 我不知道我在哪里犯错。 My code is lengthy and could not reproduce everything here 我的代码很长，无法在此处复制所有内容

Here are the results in first case: 这是第一种情况的结果：

Accuracy: 0.1071 Error: 1.45003
Accuracy: 0.5149 Error: 0.259084
Accuracy: 0.7199 Error: 0.197301
Accuracy: 0.7934 Error: 0.138881
Accuracy: 0.8137 Error: 0.136115
Accuracy: 0.8501 Error: 0.15382
Accuracy: 0.8642 Error: 0.100813
Accuracy: 0.8761 Error: 0.0882854
Accuracy: 0.882 Error: 0.0874575
Accuracy: 0.8861 Error: 0.0629579
Accuracy: 0.8912 Error: 0.101606
Accuracy: 0.8939 Error: 0.0744626
Accuracy: 0.8975 Error: 0.0775732
Accuracy: 0.8957 Error: 0.0909776
Accuracy: 0.9002 Error: 0.0799101
Accuracy: 0.9034 Error: 0.0621196
Accuracy: 0.9004 Error: 0.0752576
Accuracy: 0.9068 Error: 0.0531508
Accuracy: 0.905 Error: 0.0699344
Accuracy: 0.8941 Error: nan
Accuracy: 0.893 Error: nan
Accuracy: 0.893 Error: nan

I have tried various things but failed to figure out where I am making a mistake. 我尝试了各种方法，但没有弄清楚我在哪里出错。

1) Change cross-entropy calculations to different things 1）将交叉熵计算更改为其他事物

self._error = -tf.reduce_sum(y*pred+ 1e-9))
self._error = -tf.reduce_sum(y*pred)
self._error = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=pred, labels=y))
self._error = tf.reduce_mean(-tf.reduce_sum(y * tf.log(pred+1e-8),reduction_indices=1))

out = tf.nn.softmax_cross_entropy_with_logits(logits = pred, labels=y)
self._error= tf.reduce_mean(out)

I have tried all the optimizers - sgd - adam - adagrad - rmsprop 我已经尝试了所有优化程序-sgd-亚当-阿达格勒-rmsprop

I have used both default optimizers provided by tensorflow and manually applied different parameters. 我使用了tensorflow提供的默认优化器和手动应用的不同参数。 To point I have even checked with learning rates as small as 0.00001 我什至检查了学习率，所以小到0.00001

Bias: 偏压：
I have tried both 1.0 and 0.0 我已经尝试了1.0和0.0

Weights: 重量：
Initialized with tf.truncated_normal_initializer(stddev=0.1, dtype = tf.float32) 用tf.truncated_normal_initializer（stddev = 0.1，dtype = tf.float32）初始化

Network: 网络：
FC784 - FC256 - FC128 - FC10 FC784-FC256-FC128-FC10
I have tried different variants of it also. 我也尝试了它的不同变体。

Activation Function: 激活功能：
- Relu - Tanh - leaky relu tf.maximum(input, 0.1*input) -Relu-Tanh-泄漏relu tf.maximum（输入，0.1 *输入）

Data: 数据：
MNIST dataset normalized by dividing it with 255. The dataset is from Keras. MNIST数据集除以255进行归一化。该数据集来自Keras。

I know this question is asked in various stackoverflow question and I have tried all the methods suggested der and to my knowledge none of them helped me. 我知道这个问题是在各种stackoverflow问题中提出的，并且我已经尝试了建议的所有方法，据我所知，它们都没有帮助我。

Answer 1

From the information above it's hard to tell what went wrong. 从上面的信息很难判断出哪里出了问题。 Yes, debugging neural network can be very tedious. 是的，调试神经网络可能非常繁琐。 Luckily, Tensorflow Debugger is a great tool that allows you to step through the network at every iteration and analyze your weights. 幸运的是， Tensorflow Debugger是一款出色的工具，可让您在每次迭代时逐步遍历网络并分析权重。

Run the following command in tfdbg to get to the first nan or inf value that shows up in the graph. 在tfdbg中运行以下命令，以获取图中显示的第一个nan或inf值。

run -f has_inf_or_nan

Answer 2

Make sure non of your labels exceeds the number-of-softmax-outputs -1 . 确保您的所有标签都不超过softmax-outputs的数量-1 。 In such case sigmoid_cross_entropy_with_logits would produce NaN instead of raising an error. 在这种情况下， sigmoid_cross_entropy_with_logits会产生NaN而不是引发错误。 Usually, this can happen if the range of your labels is 1..N, yet the softmax indices are running from 0..N-1 通常，如果标签的范围是1..N，但是softmax索引从0..N-1开始，则可能发生这种情况

使用张量流训练神经网络一段时间后生成Nans

问题描述

2 个解决方案

解决方案1
0 2017-03-24 07:53:10

解决方案2
0 2017-03-24 08:02:23

使用张量流训练神经网络一段时间后生成Nans

问题描述

2 个解决方案

解决方案1 0 2017-03-24 07:53:10

解决方案2 0 2017-03-24 08:02:23

解决方案1
0 2017-03-24 07:53:10

解决方案2
0 2017-03-24 08:02:23