I am trying to practice my machine learning skills with Tensorflow/Keras but I am having trouble around fitting the model. Let me explain what I've done and where I'm at.
I am using the dataset from Kaggle's Costa Rican Household Poverty Level Prediction Challenge
Since I am just trying to get familiar with the Tensorflow workflow, I cleaned the dataset by removing a few columns that had a lot of missing data and then filled in the other columns with their mean. So there are no missing values in my dataset.
Next I loaded the new, cleaned, csv in using make_csv_dataset
from TF.
batch_size = 32
train_dataset = tf.data.experimental.make_csv_dataset(
'clean_train.csv',
batch_size,
column_names=column_names,
label_name=label_name,
num_epochs=1)
I set up a function to return my compiled model like so:
f1_macro = tfa.metrics.F1Score(num_classes=4, average='macro')
def get_compiled_model():
model = tf.keras.Sequential([
tf.keras.layers.Dense(512, activation=tf.nn.relu, input_shape=(137,)), # input shape required
tf.keras.layers.Dense(256, activation=tf.nn.relu),
tf.keras.layers.Dense(4, activation=tf.nn.softmax)
])
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=[f1_macro, 'accuracy'])
return model
model = get_compiled_model()
model.fit(train_dataset, epochs=15)
Below is the result of that
A link to my notebook is Here
I should mention that I strongly based my implementation on Tensorflow's iris data walkthrough
Thank you!
After a while, I was able to find the issues with your code they are in the order of importance. (First is of highest importance)
You are doing multi-class classification (not binary classification). Therefore your loss should be categorical_crossentropy
.
You are not onehot encoding your labels. Using binary_crossentropy
and having labels as a numerical ID is definitely not the way forward. Instead, you should do onehot encode your labels and solve this like a multi-class classification problem. Here's how you do that.
def pack_features_vector(features, labels):
"""Pack the features into a single array."""
features = tf.stack(list(features.values()), axis=1)
return features, tf.one_hot(tf.cast(labels-1, tf.int32), depth=4)
x = train_df[feature_names].values #returns a numpy array
min_max_scaler = preprocessing.StandardScaler()
x_scaled = min_max_scaler.fit_transform(x)
train_df = pd.DataFrame(x_scaled)
These issues should set your model straight.
As the other comment does give some best practice advice that are definitely worth considering, this comment concentrates on your observation that your loss and accuracy are decoupled - which is counter intuitive at first.
Have a look at metrics.py
, there you can find definition of all available metrics including different types of accuracy.
The type of accuracy
is determined based on the objective function, see training.py
. The default choice for binary_accuracy
is as follows:
if output_shape[-1] == 1 or self.loss_functions[i] == objectives.binary_crossentropy:
# case: binary accuracy
acc_fn = metrics_module.binary_accuracy
And binary_accuracy
is defined as follows in the metric:
def binary_accuracy(y_true, y_pred):
'''Calculates the mean accuracy rate across all predictions for binary
classification problems.
'''
return K.mean(K.equal(y_true, K.round(y_pred)))
In the objective function it's this way:
def binary_crossentropy(y_true, y_pred, from_logits=False, label_smoothing=0):
y_pred = K.constant(y_pred) if not K.is_tensor(y_pred) else y_pred
y_true = K.cast(y_true, y_pred.dtype)
if label_smoothing is not 0:
smoothing = K.cast_to_floatx(label_smoothing)
y_true = K.switch(K.greater(smoothing, 0),
lambda: y_true * (1.0 - smoothing) + 0.5 * smoothing,
lambda: y_true)
return K.mean(K.binary_crossentropy(y_true, y_pred, from_logits=from_logits), axis=-1)
So to wrap it up:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.