How does keras neural network find the class to attribute even if the output layer has any size?

Question

I have a sample of data whith a binary class (true or false). Neural networks give to each class weight and the maximum will determine the attributed class. But why keras works even if the output layer has not the proper number of neurons? (= number of class = 2 in my case, 0 or 1).

import keras
from model import *

X_train, X_test, y_train, y_test = train_test_split(df_features, df_labels, test_size=0.25, random_state=10)

model = keras.Sequential([
    keras.layers.Flatten(input_shape=(len(X_test.columns),)),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(128, activation='softmax') # Shouldn't be two here ?
])

model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

# len(y_train.columns) == 1
history = model.fit(X_train, y_train, epochs=100, validation_split=0.25)

scores = model.evaluate(X_test, y_test, verbose=0)

print(model.metrics_names)
print('scores=', scores)

Hypothesis: it add an implicit layer to the end, or maybe it ignore some neurons, or totally something else ?

Edit.: Data added

>>> print(y_train)
[0 0 0 ... 0 1 0]

>>> print(y_test)
      Class
1424      0
3150      1
2149      0
1700      0
4330      0
4200      0
# etc, ~1000 entries
>>> print('len(y_train)=', len(y_train))
len(y_train)= 2678
>>> print('len(y_test)=', len(y_test))
len(y_test)= 893

Answer 1

I believe the issue is with how your loss sparse_cartegorical_crossentropy works. This loss (as opposed to categorical_crossentropy ) assumes that the y_actual will be provided as a label encoded format instead of a one-hot encoded format . Meaning if there are 5 classes you are predicting, then the y_actual array is provided as [0,2,4,1,2,2,3,3,1...] where each value in the 1-D array represents a class number out of the 5 possible classes.

Let's check an example directly from the tf2 documentation around standalone usage of this loss -

y_true = [1, 2] #class number from 0 to 2
y_pred = [[0.05, 0.95, 0], [0.1, 0.8, 0.1]] #3 class classification output
loss = tf.keras.losses.sparse_categorical_crossentropy(y_true, y_pred)
assert loss.shape == (2,)
loss.numpy()

[0.0513, 2.253]

What this means in your case is that when your model returns a 128-dimensional output, it assumes that there are 128 classes in this classification problem. However, since the loss is sparse_categorical_crossentropy , it waits to receive a single number between 0-127 which it would then use to make calculate its error.

Since you are always giving it a 0 or 1 in all cases, it assumes that the actual class that the sample belongs to is class 0 or 1 only out of the 128 classes and none of the other classes. Therefore, it the code runs but it is faulty because instead of reading the single-digit it gets from y_train (or y_test ) as a binary class, it assumes it belongs to one of the classes among 128 other classes.

print(y_train)
[0 0 0 ... 0 1 0]

#The first 0 here, is being considered as one class out of 128 other classes. 
#The code would still work if u changed that to say 105 instead of 0.
#Similarly for all the other 0s and 1s.

Hopefully, that makes sense.

How does keras neural network find the class to attribute even if the output layer has any size?

Question

1 answers

solution1
0 ACCPTED 2020-10-14 21:46:05

How does keras neural network find the class to attribute even if the output layer has any size?

Question

1 answers

solution1 0 ACCPTED 2020-10-14 21:46:05

solution1
0 ACCPTED 2020-10-14 21:46:05