简体   繁体   English

训练期间的 Tensorflow 自定义损失函数 NaN

[英]Tensorflow custom loss function NaNs during training

I am trying to write a custom loss function for a CNN that detects an object and draws a non axis aligned bounding box on it.我正在尝试为检测对象并在其上绘制非轴对齐边界框的 CNN 编写自定义损失函数。 My inputs are 200x200 images and the outputs / labels are 6d vectors of the form我的输入是 200x200 图像,输出/标签是形式的 6d 向量

[object_present, x,y, angle, width, height]

Where object_present is a binary feature representing whether the object is present or not, (x,y) is the centre of the bounding box, angle is the angle of rotation of the bbox from being axis aligned, and width and height are the dimensions of the bbox.其中 object_present 是表示对象是否存在的二元特征,(x,y) 是边界框的中心,angle 是 bbox 从轴对齐的旋转角度,width 和 height 是边界框的尺寸bbox。 When object_present = 0, all the other features are set to NaN.当 object_present = 0 时,所有其他特征都设置为 NaN。

Consequently, my custom loss function needs to ignore the NaNs for a negative sample and apply Binary Cross entropy loss to object_present feature.因此,我的自定义损失函数需要忽略负样本的 NaN,并将二元交叉熵损失应用于 object_present 特征。 For positive samples, I also have to include MSE loss for (x,y) and width, height, and an angular regression loss that I have defined as arctan(sin(angle1 - angle2), cos(angle1-angle2)).对于正样本,我还必须包括 (x,y) 和宽度、高度的 MSE 损失,以及我定义为 arctan(sin(angle1 -angle2), cos(angle1-angle2)) 的角度回归损失。 My implementation is as follows:我的实现如下:

binary_loss_func = tf.keras.losses.BinaryCrossentropy()
def loss_func(true_labels, pred_labels):
    binary_loss = binary_loss_func(true_labels[:,0], pred_labels[:, 0])
    mse_loss1 = tf.reduce_mean(tf.where(tf.math.is_nan(true_labels[:,1:3]), tf.zeros_like(true_labels[:, 1:3]),
    tf.square(tf.subtract(true_labels[:, 1:3], pred_labels[:, 1:3])))) 
    mse_loss2 = tf.reduce_mean(tf.where(tf.math.is_nan(true_labels[:,4:]), 
    tf.zeros_like(true_labels[:, 4:]), tf.square(tf.subtract(true_labels[:, 4:], pred_labels[:, 4:]))))
    angular_loss = tf.reduce_mean(tf.where(is_nan(true_labels[:,3]), tf.zeros_like(true_labels[:, 3]), 
    tf.abs(tf.atan2(tf.sin(true_labels[:, 3] - pred_labels[:, 3]), tf.cos(true_labels[:, 3] - pred_labels[:, 3])))))
    return  mse_loss1 + mse_loss2 + binary_loss + angular_loss

My issue is that this returns NaN loss values after the first batch of training (only the first batch does not give NaN loss), even though I think the above code should return 0 loss for negative samples.我的问题是这在第一批训练后返回 NaN 损失值(只有第一批不给 NaN 损失),即使我认为上面的代码应该为负样本返回 0 损失。 I have confirmed that the Binary Loss function is returning real numbers as it should, so the issue is with the other components of the loss.我已经确认二元损失函数正在返回实数,所以问题在于损失的其他部分。 After some debugging with tf.print statements, I found that pred_labels becomes NaN after the first batch of training.用tf.print语句调试了下,发现第一批训练后pred_labels变成了NaN。 I am not sure why this is happening, and if it is an issue with how my custom loss function is defined or if it is an issue with my model.我不确定为什么会发生这种情况,以及是否与我的自定义损失函数的定义方式有关,或者是否与我的模型有关。 The model I am using is:我使用的模型是:

IMAGE_SIZE = 200
CONV_PARAMS = {"kernel_size": 3, "use_bias": False, "padding": "same"}
CONV_PARAMS2 = {"kernel_size": 5, "use_bias": False, "padding": "same"}

model = Sequential()
model.add(
    Reshape((IMAGE_SIZE, IMAGE_SIZE, 1), input_shape=(IMAGE_SIZE, IMAGE_SIZE))
)
model.add(Conv2D(16, **CONV_PARAMS))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPool2D())
model.add(Conv2D(32, **CONV_PARAMS))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPool2D())
model.add(Conv2D(64, **CONV_PARAMS))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPool2D())
model.add(Flatten())
model.add(Dense(6))

Please help!请帮忙!

You seem to still be calculating your loss with nan values, although you are trying to avoid it.尽管您试图避免它,但您似乎仍在使用nan值计算您的损失。 Maybe try something like this:也许尝试这样的事情:

binary_loss_func = tf.keras.losses.BinaryCrossentropy()
def loss_func(true_labels, pred_labels):
    true_labels = tf.where(tf.math.is_nan(true_labels), tf.zeros_like(true_labels), true_labels)
    condition = tf.equal(true_labels, 0.0)

    binary_loss = tf.where(condition, tf.reduce_mean(true_labels), binary_loss_func(true_labels[:,0], pred_labels[:, 0]))

    mse_loss1 = tf.reduce_mean(tf.where(tf.equal(true_labels[:, 1:3], 0.0), true_labels[:, 1:3],
                                        tf.square(tf.subtract(true_labels[:, 1:3], pred_labels[:, 1:3])))) 

    mse_loss2 = tf.reduce_mean(tf.where(tf.equal(true_labels[:, 4:], 0.0), true_labels[:, 4:], 
                                        tf.square(tf.subtract(true_labels[:, 4:], pred_labels[:, 4:]))))

    angular_loss = tf.reduce_mean(tf.where(tf.equal(true_labels[:, 3], 0.0), true_labels[:, 3], 
    tf.abs(tf.atan2(tf.sin(true_labels[:, 3] - pred_labels[:, 3]), tf.cos(true_labels[:, 3] - pred_labels[:, 3])))))

    return  mse_loss1 + mse_loss2 + binary_loss + angular_loss

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM