简体   繁体   中英

Neural Network flatlines after one epoch

I'm using keras to create a convolutional neural network to try to classify images into two distinct classes and for some reason after the first epoch the accuracy never changes.

After using Keras's to_categorical() my labels look like:

[[0.  1.]
[1.  0.]
[1.  0.]
[0.  1.]]

and the code for my model is:

model = Sequential()
model.add(Conv2D(filters=32, kernel_size=[5, 5], strides=1, padding='same', activation='relu', input_shape=(imageSize, imageSize, 3)))
model.add(MaxPooling2D())
model.add(Conv2D(filters=64, kernel_size=[5, 5], strides=1, padding='same', activation='relu'))
model.add(MaxPooling2D())
model.add(Flatten())
model.add(Dense(2))
sgd = SGD()  # Use stochastic gradient descent for now
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])

model.summary()

counter = 0
# Train one cycle at a time so we can shuffle data inbetween
for x in range(trainingEpochs):

    counter += 1
    print()  # New line
    print('Epoch ' + str(counter))

    trainingImages, trainingLabels = shuffle(trainingImages, trainingLabels, random_state=0)  # Shuffle both sets in unison

    model.fit(x=trainingImages, y=trainingLabels, batch_size=32, epochs=1, verbose=2)

This code results in the output:

Epoch 1
36s - loss: 5.0770 - acc: 0.3554

Epoch 2
36s - loss: 4.9421 - acc: 0.3066

Epoch 3
36s - loss: 4.9421 - acc: 0.3066

Epoch 4
36s - loss: 4.9421 - acc: 0.3066

So far I've tried changing the batch size, using binary_crossentropy, changing the shuffling method, changing the convolution parameters, using black and white photos instead of RGB, using different size pictures, using ADAM instead of SGD, and using a lower learning rate for SGD but none of those have fixed the problem. I'm at a loss, does anyone have any ideas?

Edit: trainingImages has a shape of (287, 256, 256, 3) if that matters at all.

The symptom is that the training loss stops being improved relatively early. Suppose that your problem is learnable at all, there are many reasons for the for this behavior. These are at the top of my head:

  1. Improper preprocessing of input:

Neural network prefers input with zero mean. Eg, if the input is all positive, it will restrict the weights to be updated in the same direction, which may not be desirable ( https://youtu.be/gYpoJMlgyXA ).

Therefore, you may want to subtract the mean from all the images (eg, subtract 127.5 from each of the 3 channels). Scaling to make unit standard deviation in each channel may also be helpful.

  1. Generalization ability of the network:

The network is not complicated or deep enough for the task.

This is very easy to check. You can train the network on just a few images (says from 3 to 10). The network should be able to overfit the data and drives the loss to almost 0. If it is not the case, you may have to add more layers such as using more than 1 Dense layer.

Another good idea is to used pre-trained weights (in applications of Keras documentation). You may adjust the Dense layers at the top to fit with your problem.

  1. Improper weight initialization.

Improper weight initialization can prevent the network from converging ( https://youtu.be/gYpoJMlgyXA , the same video as before).

For the ReLU activation, you may want to use He initialization instead of the default Glorot initialiation. I find that this may be necessary sometimes but not always.

Lastly, you can use debugging tools for Keras such as keras-vis, keplr-io, deep-viz-keras. They are very useful to open the blackbox of convolutional networks.

You need to add a sigmoid non-linearity to the last layer of your network and change the number of outputs to 1. This non-linearity maps the input to [0, 1].

# ...
model.add(Flatten())
model.add(Dense(1))
# add nonlinearity
model.add(Activation('sigmoid'))

Also, change your model's loss to binary_crossentropy :

model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['accuracy'])

Last, make sure your labels are boolean (or int in 0 or 1) and of shape (n_observations, 1). That is, change them from [[0, 1], [1, 0], [1, 0]] to [0 1 0 0 1].

Look at keras' blog describing how to build a binary image classifier if you still run into problems.

After going through the blog post @rafaelvalle linked I managed to determine that my problem resulted from the encoding of my labels. Originally I had them as one-hot encodings which looked like [[0, 1], [1, 0], [1, 0]] and in the blog post they were in the format [0 1 0 0 1] . Changing my labels to this and using binary crossentropy has gotten my model to work properly. Thanks to Ngoc Anh Huynh and rafaelvalle!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM