“val_loss”并没有从 inf 得到改善，但损失很好地减少了

Question

I'm training a Keras model with a custom function, which I have already teste successfully before.我正在使用自定义 function 训练 Keras model，我之前已经成功测试过。 Recently, I'm training it with a new dataset and I've got a strange result: The model trains fine but the val_loss gives nan .最近，我用一个新的数据集训练它，我得到了一个奇怪的结果： model 训练很好，但val_loss给出了nan 。 Here is the loss:这是损失：

def Loss(y_true,y_pred):
    y_pred = relu(y_pred)
    z = k.maximum(y_true, y_pred)
    y_pred_negativo = Lambda(lambda x: -x)(y_pred)
    w = k.abs(add([y_true, y_pred_negativo])) 
    if k.sum(z) == 0:
        error = 0
    elif k.sum(y_true) == 0 and k.sum(z) != 0:
        error = 100
    elif k.sum(y_true) == 0 and k.sum(z) == 0:
        error = 0
    else:
        error = (k.sum(w)/k.sum(z))*100
    return error

I have tried many things:我尝试了很多事情：

Looked at the data for NaNs查看了 NaN 的数据
Normalization - on and off标准化 - 打开和关闭
Clipping - on and off剪辑 - 打开和关闭
Dropouts - on and off辍学 - 打开和关闭

Someone told me that it could be a problem with CUDA installation, but I'm not sure.有人告诉我这可能是 CUDA 安装的问题，但我不确定。

Any idea about what is the problem or how I can diagnosis it?关于问题是什么或我如何诊断它的任何想法？

Answer 1

The problem turned out to be division per zero, but the reason why it was taking place was a little tricky.问题原来是除以零，但它发生的原因有点棘手。 As you can see, the above definition has some conditionals which were supposed to preclude division per zero.正如你所看到的，上面的定义有一些条件，它们应该排除除零。 However, they were written to handle NumPy objects and not tensors, which are the objects passed by the Keras methods.但是，它们是为处理 NumPy 对象而不是张量而编写的，张量是由 Keras 方法传递的对象。 Therefore, they were never taking place, and division per zero was happening very often.因此，它们从未发生过，并且经常发生除零。

In order to fix it, I had to rewrite the Loss in terms of Keras conditionals - remind, avoiding to mix pure Keras with tf.keras - just as I've posted here .为了修复它，我不得不根据 Keras 条件重写损失 - 提醒，避免将纯 Keras 与 tf.keras 混合 - 正如我在这里发布的那样。 Any further comment is more than welcomed!任何进一步的评论都非常受欢迎！

“val_loss”并没有从 inf 得到改善，但损失很好地减少了

问题描述

1 个解决方案

解决方案1
0 已采纳 2021-04-11 19:24:31

“val_loss”并没有从 inf 得到改善，但损失很好地减少了

问题描述

1 个解决方案

解决方案1 0 已采纳 2021-04-11 19:24:31

解决方案1
0 已采纳 2021-04-11 19:24:31