简体   繁体   中英

Keras CNN - always predicting the same class in a balanced dataset, but the accuracy is high - why?

I am facing with the following problem, for which I firstly want to give you the code and then explain it in detail:

#Just try to implement the modular
from keras.models import Sequential
from keras.layers import Convolution1D, MaxPooling1D
from keras.layers import Dense, Dropout, Activation, Flatten, BatchNormalization
from keras.optimizers import SGD
import numpy
from numpy import newaxis

dataset = numpy.loadtxt("example.csv", delimiter = ",")
X = dataset[:, 0:200]
Y = dataset[:, 200]
s1 = X.shape[0]
s2 = X.shape[1]
newshape = (s1, s2, 1)
X = numpy.reshape(X, newshape)
#print(X.shape[2])
model = Sequential()
model.add(Convolution1D(16, 3, border_mode = "same", input_shape = (200, 1)))
#model.add(Dense(12, input_dim=200, init='uniform', activation='relu'))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(MaxPooling1D(pool_length = 2))
model.add(Convolution1D(32, 3, border_mode = "same"))
model.add(Convolution1D(32, 3, border_mode = "same"))
model.add(Activation('relu'))
model.add(MaxPooling1D(pool_length = 2))
model.add(Convolution1D(32, 3, border_mode = "same", activation = 'tanh'))
model.add(Convolution1D(32, 3, border_mode = "same", activation = 'tanh'))
model.add(Flatten())
model.add(BatchNormalization())
model.add(Dense(100, activation = 'tanh'))
model.add(Dropout(0.2))
model.add(Dense(50, activation = 'tanh'))
model.add(Dropout(0.2))
model.add(Dense(20, activation = 'tanh'))
model.add(Dropout(0.2))
model.add(Dense(1))
model.add(Activation('sigmoid'))
print("here1")

sgd = SGD(lr=0.1, decay=0.001, momentum=0.9, nesterov=True)
model.compile(loss = "binary_crossentropy", optimizer = sgd, metrics = ['accuracy'] )
print('here2')
model.fit(X, Y, batch_size = 64, nb_epoch = 1)
#print("here3")
#scores = model.evaluate(X, Y)
score = model.evaluate(X, Y, verbose = 0)
print(score)
output = model.predict(X, batch_size = 20,  verbose = 0)
print(output[0:100])
#print("%s: %.2f%%" % (model.metrics_names[1], score[1]*100))
#scores = model.evaluate(X, Y)

What I am doing is the following: As an input (X), I feed the network DNA code (coded as numbers), the label (Y) is binary (either 0 or 1). I want to predict Y. When I run the model, it behaves very strangely, at least in a way I cannot understand:

在此处输入图片说明

Now to picture, here is my question: On the predicted labels output (result of the line print(output[0:100]) ) the model is always predicting a 0. However, the accuracy, as given above is seemingly very high. Why is that? Mind that the dataset is balanced, meaning half of the observations are labeled 1, half of them 0. So predicting all the values with 0 should result in an accuracy rate of 0.5 .

Edit:

As I was asked for the data, here is a screenshot from it. The last number of each line is the label.

在此处输入图片说明

Maybe your data are not properly scaled. As a debug step, you can use linear activation function on last layer and see the result.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM