如何在Tensorflow中对S形交叉熵损失函数应用权重？

Question

The training dataset contains two classes A and B which we represent as 1 and 0 in our target labels correspondingly. 训练数据集包含两个类A和B，我们在目标标签中相应地表示为1和0 。 Out labels data is heavily skewed towards class 0 which takes roughly 95% of the data while our class 1 is only 5%. Out标签数据严重偏向0级，大约95％的数据，而我们的1级只有5％。 How should we construct our loss function in such case? 在这种情况下我们应该如何构建我们的损失函数？

I found Tensorflow has a function that can be used with weights: 我发现Tensorflow有一个可以用于权重的函数：

tf.losses.sigmoid_cross_entropy

weights acts as a coefficient for the loss. weights作为损失的系数。 If a scalar is provided, then the loss is simply scaled by the given value. 如果提供了标量，那么损失将简单地按给定值进行缩放。

Sounds good. 听起来不错。 I set weights to 2.0 to make loss higher and punish errors more. 我将权重设置为2.0以使损失更高并且更多地惩罚错误。

loss = loss_fn(targets, cell_outputs, weights=2.0, label_smoothing=0)

However, not only the loss didn't go down it increased and the final accuracy on the dataset decreased slightly. 然而，不仅损失没有减少，而且数据集的最终准确度略有下降。 Ok, maybe I misunderstood and it should be < 1.0, I tried a smaller number. 好吧，也许我误解了它应该<1.0，我尝试了一个较小的数字。 This didn't change anything, I got almost the same loss and accuracy. 这没有改变任何东西，我得到了几乎相同的损失和准确性。 O_o O_O

Needless to say that same network trained on the same dataset but with loss weight 0.3 significantly reduces the loss up to x10 times in Torch / PyTorch. 不用说，在同一数据集上训练但损失权重为0.3的相同网络在Torch / PyTorch中显着降低了高达x10倍的损失。

Can somebody please explain how to use loss weights in Tensorflow? 有人可以解释如何在Tensorflow中使用损失权重吗？

Answer 1

If you're scaling the loss with a scalar, like 2.0, then basically you're multiplying the loss and therefore the gradient for backpropagation. 如果你用标量（比如2.0）来缩放损失，那么基本上你就是将损失乘以反向传播的梯度。 It's similar to increasing the learning rate, but not exactly the same, because you're also changing the ratio to regularization losses such as weight decay. 它类似于提高学习率，但不完全相同，因为你也将比率改为正规化损失，如体重衰减。

If your classes are heavily skewed, and you want to balance it at the calculation of loss, then you have to specify a tensor as weight, as described in the manual for tf.losses.sigmoid_cross_entropy() : 如果您的类严重偏差，并且您想在计算损失时进行平衡，则必须将张量指定为权重，如tf.losses.sigmoid_cross_entropy()手册中所述：

weights : Optional Tensor whose rank is either 0, or the same rank as labels, and must be broadcastable to labels (ie, all dimensions must be either 1, or the same as the corresponding losses dimension). 权重：可选的Tensor，其等级为0或与标签相同的等级，并且必须可以向标签广播（即，所有维度必须为1，或与相应的损失维度相同）。

That is make the weights tensor 1.0 for class 0, and maybe 10 for class 1, and now "false negative" losses will be much more heavily counted. 这使得0级的权重张量为1.0，而1级的权重可能为10，现在“假阴性”损失将更加重要。

It is an art how much you should over-weigh the underrepresented class. 这是一门艺术，你应该过多地权衡那些代表性不足的阶级。 If you overdo it, the model will collapse and will predict the over-weighted class all the time. 如果你过度使用它，模型将会崩溃，并且会一直预测过度加权的类。

An alternative to achieve the same thing is using tf.nn.weighted_cross_entropy_with_logits() , which has a pos_weight argument for the exact same purpose. 实现相同目的的另一种方法是使用tf.nn.weighted_cross_entropy_with_logits() ，它具有pos_weight参数。 But it's in tf.nn not tf.losses so you have to manually add it to the losses collection. 但它不是tf.nn而是tf.losses所以你必须手动将它添加到损失集合中。

Generally another method to handle this is to arbitrarily increase the proportion of the underrepresented class at sampling. 通常，另一种处理此问题的方法是在抽样时任意增加代表性不足的类别的比例。 That should not be overdone either, however. 但是，这也不应该过头了。 You can do both of these things too. 你也可以做这两件事。

Answer 2

You can set a penalty for misclassification of each sample. 您可以为每个样本的错误分类设置惩罚。 If weights is a tensor of shape [batch_size] , the loss for each sample will be multiplied by the corresponding weight. 如果weights是形状张量[batch_size] ，则每个样本的损失将乘以相应的权重。 So if you assign the same weight to all samples (which is the same as using a scalar weight), your loss will only be scaled by this scalar, and the accuracy should not change. 因此，如果为所有样本分配相同的权重（与使用标量权重相同），则您的损失将仅通过此标量进行缩放，并且准确性不应更改。

If you instead assign different weights for the minority class and the majority class, the contributions of the samples to the loss function will be different, and you should be able to influence the accuracy by choosing your weights differently. 如果您为少数类和多数类分配不同的权重，则样本对损失函数的贡献将不同，您应该能够通过不同地选择权重来影响准确性。

A few scenarios (your choice will depend on what you need): 一些场景（您的选择取决于您的需求）：

1.) If you want a good overall accuracy, it you could choose the weights of the majority class to be very large and the weights of the minority class much smaller. 1.）如果你想要一个良好的整体准确性，你可以选择大多数类的权重非常大，少数类的权重要小得多。 This will probably lead to a classification of all events into the majority class (ie 95 % of total classification accuracy, but the minority class will usually be classified into the wrong class. 这可能会导致将所有事件分类为多数类（即总分类准确度的95％，但少数类通常会被归类为错误的类。

2.) If your signal is the minority class and the background is the majority class, you probably want very little background contamination in your predicted signal, ie you want almost no background samples to be predicted as signal. 2.）如果您的信号是少数类别且背景是大多数类别，您可能希望预测信号中的背景污染很少，即您几乎不希望将背景样本预测为信号。 This will also happen if you choose the majority weight much larger than the minority weight, but you might find that the network tends to predict all samples to be background. 如果您选择的多数权重远大于少数权重，也会发生这种情况，但您可能会发现网络倾向于预测所有样本都是背景。 So you will not have any signal samples left. 所以你不会留下任何信号样本。 In this case you should consider a large weight for the minority class + an extra loss for background samples being classified as signal samples (false positives), like this: 在这种情况下，您应该考虑少数类别的大权重+背景样本被分类为信号样本（误报）的额外损失，如下所示：

loss = weighted_cross_entropy + extra_penalty_for_false_positives

如何在Tensorflow中对S形交叉熵损失函数应用权重？

问题描述

2 个解决方案

解决方案1
2 已采纳 2018-04-13 09:34:56

解决方案2
1 2018-04-13 09:39:00

如何在Tensorflow中对S形交叉熵损失函数应用权重？

问题描述

2 个解决方案

解决方案1 2 已采纳 2018-04-13 09:34:56

解决方案2 1 2018-04-13 09:39:00

解决方案1
2 已采纳 2018-04-13 09:34:56

解决方案2
1 2018-04-13 09:39:00