简体   繁体   English

TensorFlow / Keras:如何从我的广义骰子损失函数中获得有意义的损失值?

[英]TensorFlow/Keras: How to get meaningful loss values from my generalized dice loss function?

I am trying to perform semantic segmentation in TensorFlow 1.10's Keras API (using Python) with the generalized dice loss function: 我正在尝试使用广义骰子损失函数在TensorFlow 1.10的Keras API(使用Python)中执行语义分割:

def generalized_dice_loss(onehots_true, logits):
    smooth = tf.constant(1e-17)
    onehots_true, logits = mask(onehots_true, logits) # Not all of my pixels contain ground truth, and I filter those pixels out, which results in shape [num_gt_pixels, num_classes]-shaped labels and logits.
    probabilities = tf.nn.softmax(logits)
    weights = 1.0 / (tf.reduce_sum(onehots_true, axis=0)**2)
    weights = tf.clip_by_value(weights, 1e-17, 1.0 - 1e-7) # Is this the correct way of dealing with inf values (the results of zero divisions)?
    numerator = tf.reduce_sum(onehots_true * probabilities, axis=0)
    numerator = tf.reduce_sum(weights * numerator)

    denominator = tf.reduce_sum(onehots_true + probabilities, axis=0)
    denominator = tf.reduce_sum(weights * denominator)

    loss = 1.0 - 2.0 * (numerator + smooth) / (denominator + smooth)
    return loss

However, I am struggling to get any meaningful loss which isn't always 1. What am I doing wrong here? 但是,我正在努力获得并非总是1的任何有意义的损失。我在这里做错了什么?

After the initial weights (one for each class) are calculated, they contain many inf 's from zero divisions, as typically only a small subset of all classes is present in a sample image. 在计算初始权重(每个类别一个)之后,它们包含来自零除的许多inf ,因为通常样本图像中仅存在所有类别的一小部分。 Therefore, I clip the weights to the range [1e-17, 1-1e-17] (is this a good idea?), after which they look like this: 因此,我将权重裁剪为[1e-17,1-1e-17]范围(这是一个好主意吗?),之后它们看起来像这样:

tf.Tensor(
[4.89021e-05 2.21410e-10 5.43187e-11 1.00000e+00 1.00000e+00 4.23855e-07
 5.87461e-09 3.13044e-09 2.95369e-07 1.00000e+00 1.00000e+00 2.22499e-05
 1.00000e+00 1.73611e-03 9.47212e-10 1.12608e-05 2.77563e-09 1.00926e-08
 7.74787e-10 1.00000e+00 1.34570e-07], shape=(21,), dtype=float32)

which seems fine to me, though they are pretty small. 尽管它们很小,但对我来说似乎很好。 The numerators ( tf.reduce_sum(onehots_true * probabilities, axis=0) , prior to their weighting) look like this: 分子(在加权之前为tf.reduce_sum(onehots_true * probabilities, axis=0) )如下所示:

tf.Tensor(
[3.42069e+01 0.00000e+00 9.43506e+03 7.88478e+01 1.50554e-02 0.00000e+00
 1.22765e+01 4.36149e-01 1.75026e+02 0.00000e+00 2.33183e+02 1.81064e-01
 0.00000e+00 1.60128e+02 1.48867e+04 0.00000e+00 3.87697e+00 4.49753e+02
 5.87062e+01 0.00000e+00 0.00000e+00], shape=(21,), dtype=float32)
tf.Tensor(1.0, shape=(), dtype=float32)

which also looks reasonable, since they're basically the labels' respective sizes times the network's certainty about them (which is likely low in the beginning of training). 看起来也很合理,因为它们基本上是标签的各自大小乘以网络对它们的确定性(在培训开始时可能较低)。 The denominators ( tf.reduce_sum(onehots_true + probabilities, axis=0) , prior to weighting) also look fine: 分母( tf.reduce_sum(onehots_true + probabilities, axis=0) ,在加权之前)也很好:

tf.Tensor(
[ 14053.483   25004.557  250343.36    66548.234    6653.863    3470.502
   5318.3926 164206.19    19914.338    1951.0701   3559.3235   7248.4717
   5984.786    7902.9004 133984.66    41497.473   25010.273   22232.062
  26451.926   66250.39     6497.735 ], shape=(21,), dtype=float32)

These are large, but that is to be expected since the class probabilities of a pixel sum to 1, and therefore the sum of these denominators should more or less equal the amount of pixels with ground truth. 这些很大,但是由于像素的类概率之和为1,因此可以预料,因此这些分母的总和应或多或少等于具有基本事实的像素的数量。

However, summing the numerators gives a very small sum (~0.001, though occasionally it's in a single digit range) while the denominator sums to very large values. 但是,将分子相加得出的总和很小(〜0.001,尽管偶尔在一个数字范围内),而分母的总和很大。 This results in my final loss being exclusively 1, or something really close to that. 这导致我的最终损失排他地为1,或者接近于此。 How can I mitigate this effect and obtain stable gradients? 如何减轻这种影响并获得稳定的渐变? I pretty much implemented the exact dice loss formula. 我几乎实现了确切的骰子损失公式。 What am I missing here? 我在这里想念什么?

Apparently I need to omit the weights, then I get a workable loss function. 显然我需要省略权重,然后得到一个可行的损失函数。 No idea why I can't use weights, and what it would add if I could. 不知道为什么我不能使用权重,如果可以的话会增加什么。 Follow-up question: https://stats.stackexchange.com/questions/414107/why-are-weights-being-used-in-generalized-dice-loss-and-why-cant-i 后续问题: https : //stats.stackexchange.com/questions/414107/why-are-weights-being-used-in-generalized-dice-loss-and-why-cant-i

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM