Tensorflow线性回归NaN输出

Question

I am trying to write code for a Machine Learning algorithm to learn both Machine Learning concepts and Tensorflow. 我正在尝试为机器学习算法编写代码，以学习机器学习概念和Tensorflow。 The algorithm I am trying to write up is: 我尝试编写的算法是：

(Not enough reputation to embed an image) https://i.imgur.com/lxgC7YV.png （没有足够的声誉来嵌入图像） https://i.imgur.com/lxgC7YV.png

"Which is equivalent to a piece wise linear regression model." “这相当于分段线性回归模型。”

From (Equation 7): 从（等式7）：

https://arxiv.org/pdf/1411.3315.pdf https://arxiv.org/pdf/1411.3315.pdf

I've loaded in the vectors I want to do this on. 我已经加载了要执行的向量。 And initialised my placeholders and variables: 并初始化我的占位符和变量：

size = len(originalVecs)
_x1 = tf.placeholder(tf.float64, shape=[size, 300], name="x1-input")
_x2 = tf.placeholder(tf.float64, shape=[size, 300], name="x2-input")

_w = tf.Variable(tf.random_uniform([300,300], -1, 1, dtype = tf.float64), name="weight1")

My prediction, cost, and training step I've set as I have set as: 我设定的预测，费用和培训步骤如下：

prediction = tf.matmul(_x1,_w)
cost = tf.reduce_sum(tf.square(tf.norm(prediction - _x2)))
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cost)

After I initialize I train with the following: 初始化后，我将进行以下训练：

for i in range(10000):
    sess.run(train_step, feed_dict={_x1: timedVecs, _x2 : originalVecs})
    if i % 1001 == 0:
        print('Epoch ', i)
        print('Prediction ', sess.run(prediction, feed_dict={_x1: timedVecs, _x2 : originalVecs}).shape)

When I run my code it is wildly unstable and just grows within about 20 iterations to just print NaNs. 当我运行我的代码时，它非常不稳定，并且仅在大约20次迭代中就增长了，仅打印NaN。 I think I'm doing a couple of things wrong but I do not know how to correct. 我认为我做错了几件事，但我不知道如何纠正。

The shape of the prediction is [20,300] when I would expect it to be [1,300]. 当我希望预测值为[1,300]时，预测的形状为[20,300]。 I want it to predict based off a single x1 and x2, rather than all at once then try to learn from the sum of the error for all data points (what I assume piecewise is). 我希望它基于单个x1和x2进行预测，而不是一次全部预测，然后尝试从所有数据点的误差总和中学习（我假设是分段的）。 I'm not sure how to go about this however as I think currently I'm minimising based on the 20,300 matrix rather than the sum of 20 1,300 matrices. 我不确定如何处理此问题，但是由于我认为目前正在基于20,300矩阵而不是20,1,300矩阵的总和最小化。

I assume matmul is correct as multiply is element wise? 我认为matmul是正确的，因为乘法是元素明智的？

I am entering my input data as a list of np arrays. 我将输入数据作为np数组的列表输入。 Each np array being a data point with 300 dimensions. 每个np数组都是一个300个维度的数据点。

Thank you. 谢谢。

Answer 1

Generally I'd avoid square roots in losses. 通常，我会避免损失的平方根。 The issue is that the derivative of x**0.5 is 0.5 * x**-0.5 , which means dividing by x . 问题是x**0.5的导数是0.5 * x**-0.5 ，这意味着除以x 。 If x is ever zero, this will produce NaNs. 如果x永远为零，将产生NaN。 In this case the square root comes from tf.norm and is immediately followed by tf.square , but the operations aren't fused together and don't cancel. 在这种情况下，平方根来自tf.norm ，紧随其后的是tf.square ，但是这些操作不会融合在一起并且不会取消。

Simplifying your loss expression to tf.reduce_sum(tf.square(prediction - _x2)) should make things more stable. 将损失表达式简化为tf.reduce_sum(tf.square(prediction - _x2))应该会使情况更稳定。

Tensorflow线性回归NaN输出

问题描述

1 个解决方案

解决方案1
0 已采纳 2018-11-17 00:16:11

Tensorflow线性回归NaN输出

问题描述

1 个解决方案

解决方案1 0 已采纳 2018-11-17 00:16:11

解决方案1
0 已采纳 2018-11-17 00:16:11