简体   繁体   中英

"val_loss" didn't improved from inf, but loss decreases nicely

I'm training a Keras model with a custom function, which I have already teste successfully before. Recently, I'm training it with a new dataset and I've got a strange result: The model trains fine but the val_loss gives nan . Here is the loss:

def Loss(y_true,y_pred):
    y_pred = relu(y_pred)
    z = k.maximum(y_true, y_pred)
    y_pred_negativo = Lambda(lambda x: -x)(y_pred)
    w = k.abs(add([y_true, y_pred_negativo])) 
    if k.sum(z) == 0:
        error = 0
    elif k.sum(y_true) == 0 and k.sum(z) != 0:
        error = 100
    elif k.sum(y_true) == 0 and k.sum(z) == 0:
        error = 0
    else:
        error = (k.sum(w)/k.sum(z))*100
    return error

I have tried many things:

  1. Looked at the data for NaNs
  2. Normalization - on and off
  3. Clipping - on and off
  4. Dropouts - on and off

Someone told me that it could be a problem with CUDA installation, but I'm not sure.

Any idea about what is the problem or how I can diagnosis it?

The problem turned out to be division per zero, but the reason why it was taking place was a little tricky. As you can see, the above definition has some conditionals which were supposed to preclude division per zero. However, they were written to handle NumPy objects and not tensors, which are the objects passed by the Keras methods. Therefore, they were never taking place, and division per zero was happening very often.

In order to fix it, I had to rewrite the Loss in terms of Keras conditionals - remind, avoiding to mix pure Keras with tf.keras - just as I've posted here . Any further comment is more than welcomed!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM