在更大的阵列上训练时的损失变为 inf 然后变为 nan(Tensorflow)

Question

this is probably the simplest model ever and I wrote this to demonstrate in a webinar that I will take in a few days这可能是有史以来最简单的模型，我写这个是为了在几天后参加的网络研讨会中进行演示

import numpy as np
from tensorflow import keras
model = tf.keras.Sequential([keras.layers.Dense(units=1, input_shape=[1])])
model.compile(optimizer='sgd', loss='mean_squared_error')


num = []
sqr = []
for i in range(20):
  num.append(i)
  sqr.append(i*i)
  print(num[i], sqr[i])

def model():
    xs = np.array(num, dtype=float)
    ys = np.array(sqr, dtype=float)
    global model
    model = tf.keras.Sequential([keras.layers.Dense(units=1, input_shape=[1])])
    model.compile(optimizer='sgd', loss='mean_squared_error')
    model.fit(xs, ys, epochs=500)


model()

print(model.predict([10]))

As you can see it is just a NN to predict the Square of a number.如您所见，它只是一个预测数字平方的神经网络。 but this gives a inf and then a nan as loss但这给出了一个inf然后一个nan作为损失

1/1 [==============================] - 0s 2ms/step - loss: nan
Epoch 499/500
1/1 [==============================] - 0s 5ms/step - loss: nan
Epoch 500/500
1/1 [==============================] - 0s 1ms/step - loss: nan

the prediction gives [[nan]]预测给出[[nan]]

If I reduce the 20 to a 7 or 8, it works.如果我将 20 减少到 7 或 8，它会起作用。 but then fails with anything above that.但随后失败了。

I think it has something to do with Learning rate, but I could be wrong... Please educate me about how this workd and a solution.我认为这与学习率有关，但我可能是错的......请教我这是如何工作的和解决方案。

Answer 1

Yes, it's for the learning rate.是的，这是为了学习率。 Just set the learning rate to 0.001 and you are good to go:只需将学习率设置为 0.001 就可以了：

import numpy as np
from tensorflow import keras
model = tf.keras.Sequential([keras.layers.Dense(units=1, input_shape=[1])])
model.compile(optimizer='sgd', loss='mean_squared_error')


num = []
sqr = []
for i in range(20):
  num.append(i)
  sqr.append(i*i)
  print(num[i], sqr[i])

def model():
    xs = np.array(num, dtype=float)
    ys = np.array(sqr, dtype=float)
    global model
    model = tf.keras.Sequential([keras.layers.Dense(units=1, input_shape=[1])])
    opt = keras.optimizers.SGD(learning_rate = 0.001)
    model.compile(optimizer = opt, loss='mean_squared_error')
    model.fit(xs, ys, epochs=500)


model()

print(model.predict([10]))

or you can just change the loss function to mean_absolute_error or use different optimizers.或者您可以将损失函数更改为mean_absolute_error或使用不同的优化器。

The reason: your numbers are very big and the mean_squared_error uses 2 * |y - pred|原因：您的数字非常大，并且mean_squared_error使用2 * |y - pred| in gradient computation so the steps that the optimizer will take in each iteration is very big and it will diverge.在梯度计算中，优化器在每次迭代中将采取的步骤非常大，并且会发散。 so by multiplying it in a smaller number (0.001 instead of 0.01) we will help it to have smaller steps and converge.因此，通过将其乘以较小的数字（0.001 而不是 0.01），我们将帮助它具有更小的步长并收敛。

Answer 2

It gives this kind of error when there is an overflow or division by zero.当存在溢出或被零除时，它会给出这种错误。 Normalize your input data and also try to reduce the learning rate.标准化您的输入数据并尝试降低学习率。

在更大的阵列上训练时的损失变为 inf 然后变为 nan(Tensorflow)

问题描述

2 个解决方案

解决方案1
0 2020-10-20 11:05:37

解决方案2
0 2020-10-20 11:06:55

在更大的阵列上训练时的损失变为 inf 然后变为 nan(Tensorflow)

问题描述

2 个解决方案

解决方案1 0 2020-10-20 11:05:37

解决方案2 0 2020-10-20 11:06:55

解决方案1
0 2020-10-20 11:05:37

解决方案2
0 2020-10-20 11:06:55