CNN predicting the same class for all input data

Question

I am trying the recreate a CNN in Keras to classify point cloud data. The CNN is described in this paper.

Network Design

This is my current implementation:

inputs = Input(shape=(None, 3))

x = Conv1D(filters=64, kernel_size=1, activation='relu')(inputs)
x = BatchNormalization()(x)
x = Conv1D(filters=64, kernel_size=1, activation='relu')(x)
x = BatchNormalization()(x)

y = Conv1D(filters=64, kernel_size=1, activation='relu')(x)
y = BatchNormalization()(y)
y = Conv1D(filters=128, kernel_size=1, activation='relu')(y)
y = BatchNormalization()(y)
y = Conv1D(filters=2048, kernel_size=1, activation='relu')(y)
y = MaxPooling1D(1)(y)

z = keras.layers.concatenate([x, y], axis=2)
z = Conv1D(filters=512, kernel_size=1, activation='relu')(z)
z = BatchNormalization()(z)
z = Conv1D(filters=512, kernel_size=1, activation='relu')(z)
z = BatchNormalization()(z)
z = Conv1D(filters=512, kernel_size=1, activation='relu')(z)
z = BatchNormalization()(z)
z = Dense(9, activation='softmax')(z)

model = Model(inputs=inputs, outputs=z)

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

The problem is that the network predicts the same class for all input data. This may be caused by a mistake in my implementation of the network, overfitting or insufficient training data. Can someone spot a mistake in my implementation?

Yousefhussien, M., Kelbe, DJ, Ientilucci, EJ, & Salvaggio, C. (2017). A Fully Convolutional Network for Semantic Labeling of 3D Point Clouds. arXiv preprint arXiv:1710.01408.

Answer 1

The same output class typically indicates a network that has just been initialized, meaning that the training weights are not loaded. Did this same class thing happen during training? Another reason could be bad pre-processing though. Another thing that I noticed is that the paper states "1D-fully convolutional network". Your dense layer is a convolutional in the paper.

Answer 2

I believe that the mistake is not in the implementation. Most probably the problem is that you have an insufficient amount of data. Also, if network predicts the same class for all input data, it usually means that you lack regularization. Try adding some Dropout layers with dropout of 0.2 to 0.5 and see if the results have improved.

Also, I don't think that

x = Conv1D(filters=64, kernel_size=1, activation='relu')(inputs)
x = BatchNormalization()(x)

is the same as

x = Conv1D(filters=64, kernel_size=1)(inputs)
x = BatchNormalization()(x)
x = ReLU(x)

and I think you need the latter.

Another thing for you to try is LeakyReLU as it usually gives better results than plain ReLU.

Answer 3

The network is fixed as it provides the expected predictions now. Thanks for the help!

Based on the answers I changed the following things:

The order of the activation and the batch normalization.
The last layer from a dense to a convolutional layer.

I also added the training=True parameter to the batch normalization layer

The code of the correct implementation:

inputs = Input(shape=(None, 3))

x = Conv1D(filters=64, kernel_size=1, input_shape=(None, 4))(inputs)
x = BatchNormalization()(x, training=True)
x = Activation('relu')(x)
x = Conv1D(filters=64, kernel_size=1, use_bias=False)(x)
x = BatchNormalization()(x, training=True)
x = Activation('relu')(x)

y = Conv1D(filters=64, kernel_size=1)(x)
y = BatchNormalization()(y, training=True)
y = Activation('relu')(y)
y = Conv1D(filters=128, kernel_size=1)(y)
y = BatchNormalization()(y, training=True)
y = Activation('relu')(y)
y = Conv1D(filters=2048, kernel_size=1)(y)
y = BatchNormalization()(y, training=True)
y = Activation('relu')(y)
y = MaxPooling1D(1)(y)

z = keras.layers.concatenate([x, y], axis=2)
z = Conv1D(filters=512, kernel_size=1)(z)
z = BatchNormalization()(z, training=True)
z = Activation('relu')(z)
z = Conv1D(filters=512, kernel_size=1)(z)
z = BatchNormalization()(z, training=True)
z = Activation('relu')(z)
z = Conv1D(filters=512, kernel_size=1)(z)
z = BatchNormalization()(z, training=True)
z = Activation('relu')(z)
z = Conv1D(filters=2, kernel_size=1, activation='softmax')(z)

model = Model(inputs=inputs, outputs=z)
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

CNN predicting the same class for all input data

Question

3 answers

solution1
1 ACCPTED 2018-10-14 19:30:24

solution2
1 2018-10-15 06:29:29

solution3
1 2018-10-16 16:31:33

CNN predicting the same class for all input data

Question

3 answers

solution1 1 ACCPTED 2018-10-14 19:30:24

solution2 1 2018-10-15 06:29:29

solution3 1 2018-10-16 16:31:33

solution1
1 ACCPTED 2018-10-14 19:30:24

solution2
1 2018-10-15 06:29:29

solution3
1 2018-10-16 16:31:33