[英]Multiple losses for multi-output regression problem
So I am trying to train a CNN model to predict 4 real-valued outputs (regression problem), I tried using a Mean Squared Error as a loss function, My question is if I branch the output layer into 4 different output layers with 4 different losses(4 MSE) does this make the network performs better regarding that the weights for the last layers are updated separately.所以我试图训练一个 CNN 模型来预测 4 个实值输出(回归问题),我尝试使用均方误差作为损失函数,我的问题是我是否将输出层分为 4 个不同的输出层,其中包含 4 个不同的输出层loss(4 MSE) 这样做是否会使网络在最后一层的权重单独更新方面表现更好。 Thank you谢谢
You said it is the same as in the link: https://towardsdatascience.com/deep-neural-networks-for-regression-problems-81321897ca33你说的和链接里的一样: https : //towardsdatascience.com/deep-neural-networks-for-regression-problems-81321897ca33
So you have something similar to this所以你有类似的东西
# The Output Layer :
NN_model.add(Dense(4, kernel_initializer='normal',activation='linear'))
A FCN with 4 neurones and thus 4 outputs.一个 FCN 有 4 个神经元,因此有 4 个输出。 So your question was if you could instead have 4 output layers.所以你的问题是你是否可以有 4 个输出层。 So instead of one layer with 4 neuron you will have 4 layers with one, all in parallel connected to the last FCN layer of the backbone network.因此,不是有 4 个神经元的一层,而是有 4 层和一个神经元,所有层都并行连接到主干网络的最后一个 FCN 层。
First of all it doesn't change the network architecture.首先,它不会改变网络架构。 Best to look at this image .最好看看这张图片。 You see that every neurons in the last layer is connected to all neurones in the second last.您会看到最后一层中的每个神经元都连接到倒数第二层中的所有神经元。 So by splitting the layer up into multiple layers you don't really change the architecture in any way.因此,通过将层拆分为多个层,您不会以任何方式真正改变架构。 You still have 4 single neurons connected to all previous ones.你仍然有 4 个单独的神经元与之前的所有神经元相连。
What it will do is change the loss.它会做的是改变损失。 Instead of calculating the MSE over all outputs, over all batches, you will not calculate the MSE over one output over all batches 4 times.您不会计算所有批次的所有输出的 MSE,而不是计算所有批次的一个输出的 MSE 4 次。 During back propagation the 4 individual gradients, from the 4 layers will be added up.在反向传播过程中,来自 4 个层的 4 个单独的梯度将相加。 So something like this will happen.所以会发生这样的事情。
loss_split = mse(output1) + mse(output2) + mse(output3) +mse(output4)
This will compare to the loss with the 4 neuron layer the following这将与以下 4 个神经元层的损失进行比较
loss = loss_split / num_outputs
In the end this means you change the magnitude of the gradient but not the direction.最后,这意味着你改变了梯度的大小而不是方向。 Instead you could just change the learning rate or multiply the loss
by num_outputs
.相反,您可以更改学习率或将loss
乘以num_outputs
。
All in all it will make no sense in splitting the output layer into four in this case.总而言之,在这种情况下将输出层分成四个是没有意义的。 It has no effect on the overall architecture and just complicates the network and introduces unnecessary overhead.它对整体架构没有影响,只会使网络复杂化并引入不必要的开销。
Btw if you really want to change things up regarding the loss and gradients you could try to use a different loss like Smooth-L1 loss.顺便说一句,如果你真的想改变关于损失和梯度的事情,你可以尝试使用不同的损失,比如 Smooth-L1 损失。 It can be more robust and perform better depending on your data.根据您的数据,它可以更健壮,性能更好。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.