"val_loss" didn't improved from inf, but loss decreases nicely

Question

I'm training a Keras model with a custom function, which I have already teste successfully before. Recently, I'm training it with a new dataset and I've got a strange result: The model trains fine but the val_loss gives nan . Here is the loss:

def Loss(y_true,y_pred):
    y_pred = relu(y_pred)
    z = k.maximum(y_true, y_pred)
    y_pred_negativo = Lambda(lambda x: -x)(y_pred)
    w = k.abs(add([y_true, y_pred_negativo])) 
    if k.sum(z) == 0:
        error = 0
    elif k.sum(y_true) == 0 and k.sum(z) != 0:
        error = 100
    elif k.sum(y_true) == 0 and k.sum(z) == 0:
        error = 0
    else:
        error = (k.sum(w)/k.sum(z))*100
    return error

I have tried many things:

Looked at the data for NaNs
Normalization - on and off
Clipping - on and off
Dropouts - on and off

Someone told me that it could be a problem with CUDA installation, but I'm not sure.

Any idea about what is the problem or how I can diagnosis it?

Answer 1

The problem turned out to be division per zero, but the reason why it was taking place was a little tricky. As you can see, the above definition has some conditionals which were supposed to preclude division per zero. However, they were written to handle NumPy objects and not tensors, which are the objects passed by the Keras methods. Therefore, they were never taking place, and division per zero was happening very often.

In order to fix it, I had to rewrite the Loss in terms of Keras conditionals - remind, avoiding to mix pure Keras with tf.keras - just as I've posted here . Any further comment is more than welcomed!

"val_loss" didn't improved from inf, but loss decreases nicely

Question

1 answers

solution1
0 ACCPTED 2021-04-11 19:24:31

"val_loss" didn't improved from inf, but loss decreases nicely

Question

1 answers

solution1 0 ACCPTED 2021-04-11 19:24:31

solution1
0 ACCPTED 2021-04-11 19:24:31