Sparse Categorical Crossentropy Loss Seems Scaled Really High, Despite Very Successful Model

Question

I'm training some CNN networks on proprietary data using Tensorflow. We have boatloads of data, and it seems that these models are capable of learning a great deal of information about classifying data (all binary classifications so far).

Sometimes, the train/test accuracy curves can be remarkably good, upwards of 95% in some cases. However, the loss functions are suspicious in terms of scale. Visually, they look alright and about how I'd expect for something performing well, but it isn't the correct order of magnitude.

Can anyone tell me how this scaling is usually appropriately done in TF/Keras? I'm confident in these models, as they've been tested on other datasets and generalized very well, but the screwy loss function isn't very nice to report.

The learning rate is on the order of 0.0001. L1 and L2 are using the same lambda value, which I've had the most success with when providing to the model as somewhere between 0.01 and 0.03. I'm currently not using any dropout.

I'm including photos of a particularly highly variant accuracy run. This isn't always the case, but it does happen sometimes. I suspect that this problem is partly due to outlier data, or possibly the regularization values.

Here are relevant code snippets.

        model = tf.keras.models.Sequential()

        if logistic_regression is not True:
            for i in range(depth):
                # 1
                model.add(Conv2D(
                    15,
                    kernel_size=(10, 3),
                    strides=1,
                    padding='same',
                    activation='relu',
                    data_format='channels_last',
                    kernel_regularizer=tf.keras.regularizers.l1_l2(
                        l1=regularizer_param,
                        l2=regularizer_param)
                    ))

                model.add(MaxPooling2D(
                    pool_size=(3, 3),
                    strides=1,
                    padding='valid',
                    data_format='channels_last'))

            model.add(BatchNormalization())

            if dropout is not None:
                model.add(Dropout(dropout))

        # flatten
        model.add(Flatten(data_format='channels_last'))

        model.add(Dense(
            len(self.groups),
            # use_bias=True if initial_bias is not None else False,
            # bias_initializer=initial_bias
            # if initial_bias is not None
            # else None,
            kernel_regularizer=tf.keras.regularizers.l1_l2(
                l1=regularizer_param,
                l2=regularizer_param)
            ))

        model.compile(
            optimizer=tf.keras.optimizers.Adagrad(
                learning_rate=learning_rate,
                initial_accumulator_value=0.1,
                epsilon=1e-07,
                name='Adagrad'),
            loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
            metrics=['accuracy'])

Answer 1

You should not worry about the scale of your loss function values. Remember, the loss function is simply a measure of how far away your network is. However, you can always scale this any way you like. What does matter is the trend in the loss over epochs? You want it to be a smooth decrease, which is what your second figure shows.

Losses are just that: an arbitrary number that's only meaningful in a relative sense, for the same network, for the same dataset. It has no other meaning. In fact, losses do not correspond well with metrics either: see Huang et al., 2019.

as they've been tested on other datasets and generalized very well,

That's what matters.

but the screwy loss function isn't very nice to report.

You could scale these losses by 1,000. They're only meaningful in a relative sense.

References:

Huang et al., 2019. Addressing the Loss-Metric Mismatch with Adaptive Loss Alignment

Sparse Categorical Crossentropy Loss Seems Scaled Really High, Despite Very Successful Model

Question

1 answers

solution1
-1 2021-05-14 05:06:33

Sparse Categorical Crossentropy Loss Seems Scaled Really High, Despite Very Successful Model

Question

1 answers

solution1 -1 2021-05-14 05:06:33

solution1
-1 2021-05-14 05:06:33