[英]TensorFlow/Keras: How to get meaningful loss values from my generalized dice loss function?
I am trying to perform semantic segmentation in TensorFlow 1.10's Keras API (using Python) with the generalized dice loss function: 我正在尝试使用广义骰子损失函数在TensorFlow 1.10的Keras API(使用Python)中执行语义分割:
def generalized_dice_loss(onehots_true, logits):
smooth = tf.constant(1e-17)
onehots_true, logits = mask(onehots_true, logits) # Not all of my pixels contain ground truth, and I filter those pixels out, which results in shape [num_gt_pixels, num_classes]-shaped labels and logits.
probabilities = tf.nn.softmax(logits)
weights = 1.0 / (tf.reduce_sum(onehots_true, axis=0)**2)
weights = tf.clip_by_value(weights, 1e-17, 1.0 - 1e-7) # Is this the correct way of dealing with inf values (the results of zero divisions)?
numerator = tf.reduce_sum(onehots_true * probabilities, axis=0)
numerator = tf.reduce_sum(weights * numerator)
denominator = tf.reduce_sum(onehots_true + probabilities, axis=0)
denominator = tf.reduce_sum(weights * denominator)
loss = 1.0 - 2.0 * (numerator + smooth) / (denominator + smooth)
return loss
However, I am struggling to get any meaningful loss which isn't always 1. What am I doing wrong here? 但是,我正在努力获得并非总是1的任何有意义的损失。我在这里做错了什么?
After the initial weights (one for each class) are calculated, they contain many inf
's from zero divisions, as typically only a small subset of all classes is present in a sample image. 在计算初始权重(每个类别一个)之后,它们包含来自零除的许多
inf
,因为通常样本图像中仅存在所有类别的一小部分。 Therefore, I clip the weights to the range [1e-17, 1-1e-17] (is this a good idea?), after which they look like this: 因此,我将权重裁剪为[1e-17,1-1e-17]范围(这是一个好主意吗?),之后它们看起来像这样:
tf.Tensor(
[4.89021e-05 2.21410e-10 5.43187e-11 1.00000e+00 1.00000e+00 4.23855e-07
5.87461e-09 3.13044e-09 2.95369e-07 1.00000e+00 1.00000e+00 2.22499e-05
1.00000e+00 1.73611e-03 9.47212e-10 1.12608e-05 2.77563e-09 1.00926e-08
7.74787e-10 1.00000e+00 1.34570e-07], shape=(21,), dtype=float32)
which seems fine to me, though they are pretty small. 尽管它们很小,但对我来说似乎很好。 The numerators (
tf.reduce_sum(onehots_true * probabilities, axis=0)
, prior to their weighting) look like this: 分子(在加权之前为
tf.reduce_sum(onehots_true * probabilities, axis=0)
)如下所示:
tf.Tensor(
[3.42069e+01 0.00000e+00 9.43506e+03 7.88478e+01 1.50554e-02 0.00000e+00
1.22765e+01 4.36149e-01 1.75026e+02 0.00000e+00 2.33183e+02 1.81064e-01
0.00000e+00 1.60128e+02 1.48867e+04 0.00000e+00 3.87697e+00 4.49753e+02
5.87062e+01 0.00000e+00 0.00000e+00], shape=(21,), dtype=float32)
tf.Tensor(1.0, shape=(), dtype=float32)
which also looks reasonable, since they're basically the labels' respective sizes times the network's certainty about them (which is likely low in the beginning of training). 看起来也很合理,因为它们基本上是标签的各自大小乘以网络对它们的确定性(在培训开始时可能较低)。 The denominators (
tf.reduce_sum(onehots_true + probabilities, axis=0)
, prior to weighting) also look fine: 分母(
tf.reduce_sum(onehots_true + probabilities, axis=0)
,在加权之前)也很好:
tf.Tensor(
[ 14053.483 25004.557 250343.36 66548.234 6653.863 3470.502
5318.3926 164206.19 19914.338 1951.0701 3559.3235 7248.4717
5984.786 7902.9004 133984.66 41497.473 25010.273 22232.062
26451.926 66250.39 6497.735 ], shape=(21,), dtype=float32)
These are large, but that is to be expected since the class probabilities of a pixel sum to 1, and therefore the sum of these denominators should more or less equal the amount of pixels with ground truth. 这些很大,但是由于像素的类概率之和为1,因此可以预料,因此这些分母的总和应或多或少等于具有基本事实的像素的数量。
However, summing the numerators gives a very small sum (~0.001, though occasionally it's in a single digit range) while the denominator sums to very large values. 但是,将分子相加得出的总和很小(〜0.001,尽管偶尔在一个数字范围内),而分母的总和很大。 This results in my final loss being exclusively 1, or something really close to that.
这导致我的最终损失排他地为1,或者接近于此。 How can I mitigate this effect and obtain stable gradients?
如何减轻这种影响并获得稳定的渐变? I pretty much implemented the exact dice loss formula.
我几乎实现了确切的骰子损失公式。 What am I missing here?
我在这里想念什么?
Apparently I need to omit the weights, then I get a workable loss function. 显然我需要省略权重,然后得到一个可行的损失函数。 No idea why I can't use weights, and what it would add if I could.
不知道为什么我不能使用权重,如果可以的话会增加什么。 Follow-up question: https://stats.stackexchange.com/questions/414107/why-are-weights-being-used-in-generalized-dice-loss-and-why-cant-i
后续问题: https : //stats.stackexchange.com/questions/414107/why-are-weights-being-used-in-generalized-dice-loss-and-why-cant-i
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.