I am trying to train a simple model for the MNIST dataset. A single hidden layer of 36 neurons.
NUM_CLASSES = 10
BATCH_SIZE = 128
EPOCHS = 100
model = models.Sequential([
layers.Input(shape = x_train.shape[1:]),
layers.Dense(units = 36, activation = activations.sigmoid, kernel_regularizer = regularizers.l2(0.0001)),
layers.Dropout(0.5),
layers.Dense(units = NUM_CLASSES, activation = activations.softmax)
])
model.summary()
model.compile(loss = losses.CategoricalCrossentropy(),
optimizer = optimizers.RMSprop(),
metrics = ['accuracy'])
history = model.fit(x_train, y_train,
batch_size = BATCH_SIZE,
epochs = EPOCHS,
verbose = 2,
validation_data = (x_val, y_val))
Without the l2
part everything works but as soon as I try to use regularization it all goes sideways and the accuracy consistently stays at 10%:
Epoch 1/300
391/391 - 1s - loss: 2.4411 - accuracy: 0.0990 - val_loss: 2.3027 - val_accuracy: 0.1064
Epoch 2/300
391/391 - 0s - loss: 2.3374 - accuracy: 0.1007 - val_loss: 2.3031 - val_accuracy: 0.1064
Epoch 3/300
391/391 - 0s - loss: 2.3178 - accuracy: 0.1016 - val_loss: 2.3041 - val_accuracy: 0.1064
Epoch 4/300
391/391 - 0s - loss: 2.3089 - accuracy: 0.1045 - val_loss: 2.3026 - val_accuracy: 0.1064
Epoch 5/300
391/391 - 0s - loss: 2.3051 - accuracy: 0.1060 - val_loss: 2.3030 - val_accuracy: 0.1064
This happens both when I manually give regularizers.l2
as an argument and when I give "l2"
as an argument.
Why exactly does this happen and what am I doing wrong?
I suspect with a high dropout rate of.5 adding regularization will prevent the network from learning. Both dropout and regularization are a means to prevent over fitting. Try using a lower dropout rate with regularization and see if the network trains properly. My experience to date is that dropout is more effective at controlling over fitting than regularizers.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.