简体   繁体   English

Keras均方误差丢失层

[英]Keras mean squared error loss layer

I am currently implementing a custom loss layer and in the process, I stumbled upon the implementation of mean squared error in the objectives.py file [1]. 我目前正在实现自定义丢失层,在此过程中,我偶然发现了objectives.py文件[1]中均方误差的实现。 I know I'm missing something in my understanding of this loss calculation because I always thought that the average was done separately across the samples for each output in each mini-batch (axis 0 of the tensor) but it appears that the average is actually being done across the last axis, which in a single vector, would mean it's being done across the outputs. 我知道我在理解这种损失计算时遗漏了一些东西,因为我一直认为平均值是在每个小批量(张量轴0)的每个输出的样本中单独完成的,但看起来平均值实际上是在最后一个轴上完成,在一个向量中,意味着它在输出中完成。 I found this by accident while working on my custom loss layer because it requires discounting the loss of a few of the outputs it a training output in a specific place is a specific value. 我在处理自定义丢失层时偶然发现了这个问题,因为它需要对特定位置的训练输出中的一些输出的丢失进行折扣。 Anyways, is my understanding of the mean squared error incorrect? 无论如何,我对均方误差的理解是不正确的? Why would Keras be using the last axis and thus turning aa 1xn output vector into a 1x1 output vector? 为什么Keras会使用最后一个轴,从而将1xn输出向量转换为1x1输出向量?

Thanks. 谢谢。

[1] https://github.com/fchollet/keras/blob/master/keras/objectives.py#L7 [1] https://github.com/fchollet/keras/blob/master/keras/objectives.py#L7

The code in question for the MSE loss is this: 有关MSE损失的问题是:

def mean_squared_error(y_true, y_pred):
    return K.mean(K.square(y_pred - y_true), axis=-1)

Here first y_pred and y_true are subtracted, then that result is passed to K.square, which as expected, returns the square of its parameter, and then that result is given to K.mean, which computes the mean. 这里首先减去y_pred和y_true,然后将结果传递给K.square,如预期的那样,返回其参数的平方,然后将结果赋予K.mean,它计算平均值。

So the code clearly is doing what its supposed to do. 所以代码显然正在做它应该做的事情。 About why the last axis is operated upon, this has nothing to do with classes, it is just a convention. 关于为什么操作最后一个轴,这与类无关,它只是一个约定。 Note that in general, there are no classes in the MSE definition. 请注意,通常,MSE定义中没有类。

Let's detail the steps of how the losses are computed in Keras to show that the axis=-1 in all the loss computations are correct : 让我们详细说明如何在Keras中计算损耗的步骤,以显示所有损耗计算中的axis=-1是正确的:

  • So we pick a loss in losses.py that we will pass to the compile method of our model. 所以我们选择了loss.py的损失,我们将传递给我们模型的compile方法。

  • In compile , the total loss is computed. compile ,计算总损失。 It happens in several steps : The first step creates a list of losses, one for each output of the model. 它分几步执行: 第一步创建一个损失列表,一个用于模型的每个输出。

  • This first step calls _weighted_masked_objective which according to the docs 'Adds support for masking and sample-weighting to an objective function' 第一步调用_weighted_masked_objective ,根据文档“添加对屏蔽和样本加权的支持到目标函数”
  • Basically, _weighted_masked_objective returns a new objective functions which take into account the weights and mask parameters which the user will provide when using the method fit . 基本上, _weighted_masked_objective返回一个新的目标函数,它考虑了用户在使用方法fit时将提供的weightsmask参数。

If I cut the code to have only the lines that matter for the question, we get to something like that. 如果我将代码剪切为仅包含对问题重要的行,我们就会得到类似的东西。

def _weighted_masked_objective(fn):
    def weighted(y_true, y_pred, weights, mask=None):
          score_array = fn(y_true, y_pred) # Compute loss as in losses.py
          return K.mean(score_array) # Average over all axis

class Model(Container):
    def compile(self, optimizer, loss, metrics=None, loss_weights=None,
                sample_weight_mode=None, weighted_metrics=None,
                target_tensors=None, **kwargs):
        weighted_losses = [_weighted_masked_objective(fn) for fn in loss_functions]

So at the end, the loss is indeed averaged over every dimension, and the use of axis=-1 is just an elegant way to enable masking and weighting of the loss at another point in the code 所以最后,损失确实是在每个维度上的平均值,并且使用axis=-1只是一种优雅的方法,可以在代码中的另一个点启用屏蔽和加权。

NB : I didn't explain the other steps because they don't contribute to answering the question. 注意:我没有解释其他步骤,因为他们没有回答这个问题。

I believe, after some conversations with coworkers, that I understand this situation and have a proper solution to the problem. 在与同事进行一些对话之后,我相信我理解了这种情况,并对问题有了适当的解决方案。 Though I knew that Theano was providing lazy-evaluated tensor functions that were running the matrix operations on the GPU, what I did not realize was that Keras's loss functions are actually written in a way where the compiled theano execution graph is smart enough to cache certain values in order to properly back-propagate the loss values back throughout the network. 虽然我知道Theano正在提供懒惰评估的张量函数,这些函数在GPU上运行矩阵运算,但我没有意识到Keras的损失函数实际上是以编译的theano执行图足够智能缓存某些方式编写的。值,以便在整个网络中正确地反向传播损失值。 Because of the type of network I'm creating, I dived into writing my own custom loss function without a completely understanding of how Theano actually treats the loss after it's been calculated by the function. 由于我正在创建的网络类型,我倾向于编写自己的自定义丢失函数,而没有完全理解Theano如何在函数计算后实际处理损失。

From what I can tell, my concern was correct that Keras' use of the last axis is a problem. 据我所知,我的担心是正确的,Keras使用最后一个轴是一个问题。 In my case, I have a fully-convolutional deep neural network and the input to the loss function is (x, 7, 16, 16) where the x is the size of the mini-batch. 在我的例子中,我有一个完全卷积的深度神经网络,损失函数的输入是(x,7,16,16),其中x是小批量的大小。 Normally, neural networks output a matrix where the first dimension is the mini-batch size and the second (usually last) dimension is the actual size of the output vector. 通常,神经网络输出矩阵,其中第一维是小批量大小,第二维(通常是最后)维是输出向量的实际大小。 Because of this, using the last axis in the output tensor to do the actual "mean" portion of the mean squared error is not correct. 因此,使用输出张量中的最后一个轴来做均方误差的实际“均值”部分是不正确的。 Instead, the axis should be 1 (in the case of zero-based indexing) because it's the 7 actual regression output features that need to be differentiated for back-propagation. 相反,轴应该是1(在基于零的索引的情况下),因为它是需要区分反向传播的7个实际回归输出特征。

I originally knew that the axis = -1 may not be correct and the reason I posted this question was because I couldn't quite explain why. 我原本知道轴= -1可能不正确,我发布这个问题的原因是因为我无法解释原因。 It's been a long time since I've had to dive into the math behind the neural networks but when I finally did, I was able to resolve the gaps ( I think ). 很长一段时间以来我不得不深入研究神经网络背后的数学,但是当我最终做到这一点时,我能够解决这些差距( 我认为 )。 I'm posting this response here for future people who may experience this same problem or gap in their understanding of Theano's tensor framework. 我在这里发布这个回复给未来的人们,他们在理解Theano的张量框架时可能遇到同样的问题或差距。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM