简体   繁体   English

Tensorflow均方误差损失函数

[英]Tensorflow mean squared error loss function

I have seen a few different mean squared error loss functions in various posts for regression models in Tensorflow: 我在Tensorflow中的回归模型的各个帖子中看到了一些不同的均方误差丢失函数:

loss = tf.reduce_sum(tf.pow(prediction - Y,2))/(n_instances)
loss = tf.reduce_mean(tf.squared_difference(prediction, Y))
loss = tf.nn.l2_loss(prediction - Y)

What are the differences between these? 这些有什么区别?

I would say that the third equation is different, while the 1st and 2nd are formally the same but behave differently due to numerical concerns. 我会说第三个方程是不同的,而第一个和第二个方程式正式相同但由于数值问题而表现不同。

I think that the 3rd equation (using l2_loss ) is just returning 1/2 of the squared Euclidean norm, that is, the sum of the element-wise square of the input, which is x=prediction-Y . 我认为第3个等式(使用l2_loss )只返回平方欧几里德范数的1/2,即输入的元素方形的和,即x=prediction-Y You are not dividing by the number of samples anywhere. 您没有在任何地方除以样本数量。 Thus, if you have a very large number of samples, the computation may overflow (returning Inf). 因此,如果您有非常多的样本,计算可能会溢出(返回Inf)。

The other two are formally the same, computing the mean of the element-wise squared x tensor. 另外两个正式相同,计算元素方形x张量的平均值。 However, while the documentation does not specify it explicitly, it is very likely that reduce_mean uses an algorithm adapted to avoid overflowing with very large number of samples. 但是,虽然文档没有明确指定,但reduce_mean很可能使用一种适用于避免溢出大量样本的算法。 In other words, it likely does not try to sum everything first and then divide by N, but use some kind of rolling mean that can adapt to an arbitrary number of samples without necessarily causing an overflow. 换句话说,它可能不会先尝试对所有内容求和然后除以N,而是使用某种滚动平均值,它可以适应任意数量的样本而不必导致溢出。

The first and the second loss functions calculate the same thing, but in a slightly different way. 第一个和第二个损失函数计算相同的东西,但方式略有不同。 The third function calculate something completely different. 第三个函数计算完全不同的东西。 You can see this by executing this code: 您可以通过执行以下代码来查看:

import tensorflow as tf

shape_obj = (5, 5)
shape_obj = (100, 6, 12)
Y1 = tf.random_normal(shape=shape_obj)
Y2 = tf.random_normal(shape=shape_obj)

loss1 = tf.reduce_sum(tf.pow(Y1 - Y2, 2)) / (reduce(lambda x, y: x*y, shape_obj))
loss2 = tf.reduce_mean(tf.squared_difference(Y1, Y2))
loss3 = tf.nn.l2_loss(Y1 - Y2)

with tf.Session() as sess:
    print sess.run([loss1, loss2, loss3])
# when I run it I got: [2.0291963, 2.0291963, 7305.1069]

Now you can verify that 1-st and 2-nd calculates the same thing (in theory) by noticing that tf.pow(a - b, 2) is the same as tf.squared_difference(a - b, 2) . 现在你可以通过注意到tf.pow(a - b, 2)tf.squared_difference(a - b, 2)相同来验证1-st和2-nd计算相同的东西(理论上tf.squared_difference(a - b, 2) Also reduce_mean is the same as reduce_sum / number_of_element . reduce_mean也与reduce_sum / number_of_element相同。 The thing is that computers can't calculate everything exactly. 问题是计算机无法准确计算所有内容。 To see what numerical instabilities can do to your calculations take a look at this: 要查看数值不稳定性可以对您的计算做些什么,请看一下:

import tensorflow as tf

shape_obj = (5000, 5000, 10)
Y1 = tf.zeros(shape=shape_obj)
Y2 = tf.ones(shape=shape_obj)

loss1 = tf.reduce_sum(tf.pow(Y1 - Y2, 2)) / (reduce(lambda x, y: x*y, shape_obj))
loss2 = tf.reduce_mean(tf.squared_difference(Y1, Y2))

with tf.Session() as sess:
    print sess.run([loss1, loss2])

It is easy to see that the answer should be 1, but you will get something like this: [1.0, 0.26843545] . 很容易看出答案应该是1,但你会得到这样的东西: [1.0, 0.26843545]

Regarding your last function, the documentation says that: 关于你的上一个功能,文档说:

Computes half the L2 norm of a tensor without the sqrt: output = sum(t ** 2) / 2 在没有sqrt:output = sum(t ** 2)/ 2的情况下计算张量的L2范数的一半

So if you want it to calculate the same thing (in theory) as the first one you need to scale it appropriately: 因此,如果您希望它(在理论上)计算与第一个相同的东西,您需要适当地缩放它:

loss3 = tf.nn.l2_loss(Y1 - Y2) * 2 / (reduce(lambda x, y: x*y, shape_obj))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM