Keras CNN - always predicting the same class in a balanced dataset, but the accuracy is high - why?

Question

I am facing with the following problem, for which I firstly want to give you the code and then explain it in detail:

#Just try to implement the modular
from keras.models import Sequential
from keras.layers import Convolution1D, MaxPooling1D
from keras.layers import Dense, Dropout, Activation, Flatten, BatchNormalization
from keras.optimizers import SGD
import numpy
from numpy import newaxis

dataset = numpy.loadtxt("example.csv", delimiter = ",")
X = dataset[:, 0:200]
Y = dataset[:, 200]
s1 = X.shape[0]
s2 = X.shape[1]
newshape = (s1, s2, 1)
X = numpy.reshape(X, newshape)
#print(X.shape[2])
model = Sequential()
model.add(Convolution1D(16, 3, border_mode = "same", input_shape = (200, 1)))
#model.add(Dense(12, input_dim=200, init='uniform', activation='relu'))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(MaxPooling1D(pool_length = 2))
model.add(Convolution1D(32, 3, border_mode = "same"))
model.add(Convolution1D(32, 3, border_mode = "same"))
model.add(Activation('relu'))
model.add(MaxPooling1D(pool_length = 2))
model.add(Convolution1D(32, 3, border_mode = "same", activation = 'tanh'))
model.add(Convolution1D(32, 3, border_mode = "same", activation = 'tanh'))
model.add(Flatten())
model.add(BatchNormalization())
model.add(Dense(100, activation = 'tanh'))
model.add(Dropout(0.2))
model.add(Dense(50, activation = 'tanh'))
model.add(Dropout(0.2))
model.add(Dense(20, activation = 'tanh'))
model.add(Dropout(0.2))
model.add(Dense(1))
model.add(Activation('sigmoid'))
print("here1")

sgd = SGD(lr=0.1, decay=0.001, momentum=0.9, nesterov=True)
model.compile(loss = "binary_crossentropy", optimizer = sgd, metrics = ['accuracy'] )
print('here2')
model.fit(X, Y, batch_size = 64, nb_epoch = 1)
#print("here3")
#scores = model.evaluate(X, Y)
score = model.evaluate(X, Y, verbose = 0)
print(score)
output = model.predict(X, batch_size = 20,  verbose = 0)
print(output[0:100])
#print("%s: %.2f%%" % (model.metrics_names[1], score[1]*100))
#scores = model.evaluate(X, Y)

What I am doing is the following: As an input (X), I feed the network DNA code (coded as numbers), the label (Y) is binary (either 0 or 1). I want to predict Y. When I run the model, it behaves very strangely, at least in a way I cannot understand:

Now to picture, here is my question: On the predicted labels output (result of the line print(output[0:100]) ) the model is always predicting a 0. However, the accuracy, as given above is seemingly very high. Why is that? Mind that the dataset is balanced, meaning half of the observations are labeled 1, half of them 0. So predicting all the values with 0 should result in an accuracy rate of 0.5 .

Edit:

As I was asked for the data, here is a screenshot from it. The last number of each line is the label.

Answer 1

Maybe your data are not properly scaled. As a debug step, you can use linear activation function on last layer and see the result.

Keras CNN - always predicting the same class in a balanced dataset, but the accuracy is high - why?

Question

1 answers

solution1
0 2019-02-26 16:02:39

Keras CNN - always predicting the same class in a balanced dataset, but the accuracy is high - why?

Question

1 answers

solution1 0 2019-02-26 16:02:39

solution1
0 2019-02-26 16:02:39