简体   繁体   中英

Custom Loss Function in Keras with Sample Weights

I am new to Tensorflow and Keras. I would like to use sample weights in a custom loss function.

If I understand correctly, this post ( Custom loss function with weights in Keras ) suggests including weights as an input into the network. As well as this: Custom weighted loss function in Keras for weighing each element

I am wondering if I am missing something (I'd also like to not define weights as a global variable). I am also a bit surprised that there is not a way to use it directly, since the Loss class _ _ call _ _ method accepts sample_weight as an argument but if I understand correctly the loss function must have only arguments y_true, and y_pred.

From the documentation ( https://keras.io/api/losses/#creating-custom-losses ), however:

Creating custom losses Any callable with the signature loss_fn(y_true, y_pred) that returns an array of losses (one of sample in the input batch) can be passed to compile() as a loss. Note that sample weighting is automatically supported for any such loss.

It sounds like one should be able to use sample weighting through the model.fit(..., sample_weight=sample_weight) method.

In this post ( Should the custom loss function in Keras return a single loss value for the batch or an arrary of losses for every sample in the training batch? ) there is a lengthy discussion about the size of the output of the loss function.

And, lastly, it is also mentioned that when a custom loss function is created, then, an array of losses (individual sample losses) should be returned. Their reduction is handled by the framework.

It seems to me that if the custom_loss(y_true, y_pred) returns a tensor of size (batch_size, ) then one ought to be able to use sample_weight in fit method. What am I missing?

Thanks a lot for any help!

Code snippets:

class NegLogLikMixedGaussian(Loss):
    """
    Negative Log-Likelihood of Mixed Gaussian with:
        num_components: number of components
        mu: means of the Gaussian components
        sg: standard deviations of the Gaussian components
    """

    def __init__(self, num_params=NUM_PARAMS_MG,
                 num_components=2, name='neg_log_lik_mixed_gaussian'):
        super(NegLogLikMixedGaussian, self).__init__(name=name)
        self.num_params = num_params
        self.num_components = num_components

    def call(self, y_true, p_predict):
        """
        Rem: for MDN the output of the networks are _parameters_ of the
        predicted distribution, _not_ point-estimates

        Parameters
        ----------
        y_true: (batch_size, 1)
            Observed value of the random variable
        p_predict: (batch_size, num_components)
            Output parameters of the network given some input

        Returns
        -------
        Negative log likelihood of the batch (batch_size, 1)

        """
        alpha, mu, sg = tf.split(p_predict,
                                 num_or_size_splits=self.num_params, axis=1)
        gm = tfd.MixtureSameFamily(
            mixture_distribution=tfd.Categorical(probs=alpha),
            components_distribution=tfd.Normal(loc=mu, scale=sg))
        log_likelihood = tf.transpose(gm.log_prob(tf.transpose(y_true)))
        return -tf.reduce_mean(log_likelihood, axis=-1)

My hope was then to be able to use:

model.compile(optimizer=Adam(learning_rate=0.005),
                  loss=NegLogLikMixedGaussian(
                      num_components=2, num_params=3))

And:


# For testing purposes
sample_weight = np.ones(len(y_train)) / len(dh.y_train_scaled)  # this should give same results as un-weighted

# Some non-trivial weights
sample_weights = np.zeros(len(y_train))
sample_weights[:5] = 1
# This will give me same results as above


model.fit(x_train, y_train, sample_weight=sample_weight,
                      batch_size=128, epochs=10)

Your code is correct, except for a few details, if I understood what you want to do. The sample weights should be of dimension (number of samples,) though the loss should be of dimension (batch_size,). The sample weights can be passed to the fit method and it seems to work. In your custom loss class, num_components and num_params are initialized but only one of the two parameters is used in the call method. I'm not sure I understood the dimensions of the tensor (alpha, mu, sg), is it of dimension (batch_size, 3, num_components) and predicted by the model? Below is a code adapted from yours, in my understanding of your problem.

import tensorflow as tf
import numpy as np
from tensorflow.keras.losses import Loss, BinaryCrossentropy
from tensorflow.keras import Model, Input
from tensorflow.keras.layers import Dense, Concatenate

import tensorflow_probability as tfp
tfd = tfp.distributions

# parameters
num_components = 2
num_samples = 1001
num_features = 10

# synthetic data
x_train = np.random.normal(size=(num_samples, num_features))
y_train = np.random.normal(size=(num_samples, 1, num_components))

print(x_train.shape)
print(y_train.shape)

class NegLogLikMixedGaussian(Loss):
    """
    Negative Log-Likelihood of Mixed Gaussian with:
        num_components: number of components
        mu: means of the Gaussian components
        sg: standard deviations of the Gaussian components
    """

    def __init__(self, num_components=2, name='neg_log_lik_mixed_gaussian'):
        super(NegLogLikMixedGaussian, self).__init__(name=name)
        self.num_components = num_components

    def call(self, y_true, p_predict):
        """
        Rem: for MDN the output of the networks are _parameters_ of the
        predicted distribution, _not_ point-estimates

        Parameters
        ----------
        y_true: (batch_size, 1, num_components)
            Observed value of the random variable
        p_predict: (batch_size, 3, num_components)
            Output parameters of the network given some input

        Returns
        -------
        Negative log likelihood of the batch (batch_size, 1)

        """
        alpha, mu, sg = tf.split(p_predict, num_or_size_splits=3, axis=1)
        gm = tfd.MixtureSameFamily(
            mixture_distribution=tfd.Categorical(probs=alpha),
            components_distribution=tfd.Normal(loc=mu, scale=sg))
        log_likelihood = gm.log_prob(y_true)
        return -tf.reduce_mean(log_likelihood, axis=[1, 2])

# the model (simple predicting (alpha, mu, sigma))
input = Input((num_features,))
alpha = tf.expand_dims(Dense(num_components, 'relu')(input), axis=1)+0.0001
# normalization
alpha = alpha/tf.reduce_sum(alpha, axis=2, keepdims=True)
mu = tf.expand_dims(Dense(num_components)(input), axis=1)
# sg > 0
sg = tf.expand_dims(Dense(num_components, 'relu')(input), axis=1)+ 0.0001

outputs = Concatenate(axis=1)([alpha, mu, sg])

model = Model(inputs=input, outputs=outputs, name='gmm_params')
model.compile(optimizer='adam', loss=NegLogLikMixedGaussian(num_components=num_components), run_eagerly=False)

sample_weight=np.ones((num_samples, ))
sample_weight[500:] = 0.

model.fit(x_train, y_train, batch_size=16, epochs=20, sample_weight=sample_weight)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM