简体   繁体   中英

design a custom loss function in Keras (on the element index in tensors in Keras)

Original question

I am trying to design a custom loss function in Keras. The target loss function is similar to the "mean_squared_error" in Kears and it presented below.

y_true and y_pred have the shape of [batch_size, system_size], and system_size is an integer, eg system_size = 5. The elements in y_true and y_pred are within in the region of [-1, 1]. Before calculating the loss, I need to change the sign of the y_pred for each sample according to the sign of the maximum absolute value of y_true and the corresponding value in y_pred. For each sample, I need to pick the index of the maximum absolute value first (assume the index is i). If y_pred[:,i] has the same sign as y_true[:,i], then the loss is the normal "mean_squared_error". If the sign of y_pred[:,i] has the different sign as y_true[:,i], all the elements of this sample in y_pred are multiplied by -1.

I tried following function to define the loss. However it does not work.

def normalized_mse(y_true, y_pred):

    y_pred = K.l2_normalize(y_pred, axis = -1) # normalize the y_pred

    loss_minus = K.square(y_true - y_pred)
    loss_plus = K.square(y_true + y_pred) 

    loss = K.mean(tf.where(tf.greater(
                            tf.div(y_true[:, K.argmax(K.abs(y_true), axis = -1))],
                            y_pred[:, K.argmax(K.abs(y_true), axis = -1))]), 0), 
                       loss_minus, loss_plus), axis = -1)

    return loss

If I replace the "K.argmax(K.abs(y_true), axis = -1))" with an integer, then the function works well. It seems that this command to pick the index of the maximum absolute value in y_pred is problematic.

Have you ever encountered such problems? Could you please give me some advice and guidance on this problem?

Thank you very much.

Elvin

Solved

Thanks to the guidance from @AnnaKrogager, the problem has been solved. As has been pointed out below, K.argmax returns a tensor instead of an integer. According to @AnnaKrogager's answer, I revised the loss function to

def normalized_mse(y_true, y_pred):

    y_pred = K.l2_normalize(y_pred, axis = -1)
    y_true = K.l2_normalize(y_true, axis = -1)

    loss_minus = K.square(y_pred - y_true)
    loss_plus = K.square(y_pred + y_true)

    index = K.argmax(K.abs(y_true), axis = -1)
    y_true_slice = tf.diag_part(tf.gather(y_true, index, axis = 1))
    y_pred_slice = tf.diag_part(tf.gather(y_pred, index, axis = 1))

    loss = K.mean(tf.where(tf.greater(tf.div(y_true_slice, y_pred_slice), 0), 
                       loss_minus, loss_plus), axis = -1)

    return loss

In order to verify it, I define another function with numpy

def normalized_mse_numpy(y_true, y_pred):
    import operator

    batch_size = y_true.shape[0]
    sample_size = y_true.shape[1]
    loss = np.zeros((batch_size))

    for i in range(batch_size):
        index = np.argmax(abs(y_true[i, :]))
        y_pred[i, :] = y_pred[i, :]/linalg.norm(y_pred[i, :])
        y_true[i, :] = y_true[i, :]/linalg.norm(y_true[i, :])

        sign_flag = y_true[i, index] / y_pred[i, index]
        if sign_flag < 0:
           for j in range(sample_size):
               loss[i] = loss[i] + (y_true[i, j] + y_pred[i, j])**2
        else:
           for j in range(sample_size):
               loss[i] = loss[i] + (y_true[i, j] - y_pred[i, j])**2

        loss[i] = loss[i] / SystemSize

     return loss

SystemSize = 5
batch_size = 10
sample_size = 5
y_true = 100 * np.random.rand(batch_size, sample_size)
y_pred = 100 * np.random.rand(batch_size, sample_size)

numpy_result = normalized_mse_numpy(y_true, y_pred)
keras_result = K.eval(normalized_mse(K.variable(y_true), K.variable(y_pred)))

print(numpy_result.sum())
0.9979743490342015

print(keras_result.sum())
0.9979742

numpy_result - keras_result
array([ 4.57889131e-08,  1.27995520e-08,  5.66398740e-09,  1.07868497e-08,
    4.41975839e-09,  7.89889471e-09,  6.68819598e-09,  1.05113101e-08,
   -9.91241045e-09, -1.20345756e-09])

I also benefit from the answer by Yu-Yang in Implementing custom loss function in keras with different sizes for y_true and y_pred .

Please be noted that tf.gather() does not support 'axis' in some early-version tensorflow, eg, 1.0.1. It works in 1.11.0. If the tensorflow version is low, you may get the error of "gather() got an unexpected keyword argument 'axis'" .

The problem is that K.argmax(K.abs(y_pred), axis = -1)) is a tensor, not an integer, and therefore slicing doesn't work. You can instead use tf.gather to do the slicing:

index = K.argmax(K.abs(y_true), axis = -1)
y_true_slice = tf.diag_part(tf.gather(y, index, axis=1))

This is equivalent to y_true[:,index] .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM