简体   繁体   中英

Why is my loss trending down while my accuracy is going to zero?

I am trying to practice my machine learning skills with Tensorflow/Keras but I am having trouble around fitting the model. Let me explain what I've done and where I'm at.

I am using the dataset from Kaggle's Costa Rican Household Poverty Level Prediction Challenge

Since I am just trying to get familiar with the Tensorflow workflow, I cleaned the dataset by removing a few columns that had a lot of missing data and then filled in the other columns with their mean. So there are no missing values in my dataset.

Next I loaded the new, cleaned, csv in using make_csv_dataset from TF.

batch_size = 32

train_dataset = tf.data.experimental.make_csv_dataset(
    'clean_train.csv',
    batch_size,
    column_names=column_names,
    label_name=label_name,
    num_epochs=1)

I set up a function to return my compiled model like so:

f1_macro = tfa.metrics.F1Score(num_classes=4, average='macro')

def get_compiled_model():
    model = tf.keras.Sequential([
      tf.keras.layers.Dense(512, activation=tf.nn.relu, input_shape=(137,)),  # input shape required
      tf.keras.layers.Dense(256, activation=tf.nn.relu),
      tf.keras.layers.Dense(4, activation=tf.nn.softmax)
    ])

    model.compile(optimizer='adam',
                loss='binary_crossentropy',
                metrics=[f1_macro, 'accuracy'])
    return model
model = get_compiled_model()
model.fit(train_dataset, epochs=15)

Below is the result of that

这是我的输出

A link to my notebook is Here

I should mention that I strongly based my implementation on Tensorflow's iris data walkthrough

Thank you!

After a while, I was able to find the issues with your code they are in the order of importance. (First is of highest importance)

  1. You are doing multi-class classification (not binary classification). Therefore your loss should be categorical_crossentropy .

  2. You are not onehot encoding your labels. Using binary_crossentropy and having labels as a numerical ID is definitely not the way forward. Instead, you should do onehot encode your labels and solve this like a multi-class classification problem. Here's how you do that.

def pack_features_vector(features, labels):
    """Pack the features into a single array."""
    features = tf.stack(list(features.values()), axis=1)
    return features, tf.one_hot(tf.cast(labels-1, tf.int32), depth=4)
  1. Normalizing your data. If you look at your training data. They are not normalized. And their values are all over the place. Therefore, you should consider normalizing your data by doing something like below. This is just for demonstration purposes. You should read about Scalers in scikit learn and choose what's best for you.
x = train_df[feature_names].values #returns a numpy array
min_max_scaler = preprocessing.StandardScaler()
x_scaled = min_max_scaler.fit_transform(x)
train_df = pd.DataFrame(x_scaled)

These issues should set your model straight.

As the other comment does give some best practice advice that are definitely worth considering, this comment concentrates on your observation that your loss and accuracy are decoupled - which is counter intuitive at first.

Have a look at metrics.py , there you can find definition of all available metrics including different types of accuracy.

The type of accuracy is determined based on the objective function, see training.py . The default choice for binary_accuracy is as follows:

 if output_shape[-1] == 1 or self.loss_functions[i] == objectives.binary_crossentropy:
     # case: binary accuracy
     acc_fn = metrics_module.binary_accuracy

And binary_accuracy is defined as follows in the metric:

def binary_accuracy(y_true, y_pred):
    '''Calculates the mean accuracy rate across all predictions for binary
    classification problems.
    '''
    return K.mean(K.equal(y_true, K.round(y_pred)))

In the objective function it's this way:

def binary_crossentropy(y_true, y_pred, from_logits=False, label_smoothing=0):
    y_pred = K.constant(y_pred) if not K.is_tensor(y_pred) else y_pred
    y_true = K.cast(y_true, y_pred.dtype)
    if label_smoothing is not 0:
        smoothing = K.cast_to_floatx(label_smoothing)
        y_true = K.switch(K.greater(smoothing, 0),
                          lambda: y_true * (1.0 - smoothing) + 0.5 * smoothing,
                          lambda: y_true)
        return K.mean(K.binary_crossentropy(y_true, y_pred, from_logits=from_logits), axis=-1)

So to wrap it up:

  1. There might be an issue with the slightly differences in the implementation.
  2. As you mention you have 4 classes, binary_crossentropy is not ready to tackle this.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM