CNN accuracy on binary classification of cat/dog images no better than random

Question

I've adapted a simple CNN from a tutorial on Analytics Vidhya .

Problem is that my accuracy on a holdout set is no better than random. I am training on ~8600 images each of cats and dogs, which should be enough data for decent model, but accuracy on the test set is at 49%. Is there a glaring omission in my code somewhere?

import os
import numpy as np
import keras
from keras.models import Sequential
from sklearn.model_selection import train_test_split
from datetime import datetime
from PIL import Image
from keras.utils.np_utils import to_categorical
from sklearn.utils import shuffle


def main():

    cat=os.listdir("train/cats")
    dog=os.listdir("train/dogs")
    filepath="train/cats/"
    filepath2="train/dogs/"

    print("[INFO] Loading images of cats and dogs each...", datetime.now().time())
    #print("[INFO] Loading {} images of cats and dogs each...".format(num_images), datetime.now().time())
    images=[]
    label = []
    for i in cat:
        image = Image.open(filepath+i)
        image_resized = image.resize((300,300))
        images.append(image_resized)
        label.append(0) #for cat images

    for i in dog:
        image = Image.open(filepath2+i)
        image_resized = image.resize((300,300))
        images.append(image_resized)
        label.append(1) #for dog images

    images_full = np.array([np.array(x) for x in images])

    label = np.array(label)
    label = to_categorical(label)

    images_full, label = shuffle(images_full, label)

    print("[INFO] Splitting into train and test", datetime.now().time())
    (trainX, testX, trainY, testY) = train_test_split(images_full, label, test_size=0.25)


    filters = 10
    filtersize = (5, 5)

    epochs = 5
    batchsize = 32

    input_shape=(300,300,3)
    #input_shape = (30, 30, 3)

    print("[INFO] Designing model architecture...", datetime.now().time())
    model = Sequential()
    model.add(keras.layers.InputLayer(input_shape=input_shape))
    model.add(keras.layers.convolutional.Conv2D(filters, filtersize, strides=(1, 1), padding='same',
                                                data_format="channels_last", activation='relu'))
    model.add(keras.layers.MaxPooling2D(pool_size=(2, 2)))
    model.add(keras.layers.Flatten())

    model.add(keras.layers.Dense(units=2, input_dim=50,activation='softmax'))
    #model.add(keras.layers.Dense(units=2, input_dim=5, activation='softmax'))

    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

    print("[INFO] Fitting model...", datetime.now().time())
    model.fit(trainX, trainY, epochs=epochs, batch_size=batchsize, validation_split=0.3)

    model.summary()

    print("[INFO] Evaluating on test set...", datetime.now().time())
    eval_res = model.evaluate(testX, testY)
    print(eval_res)

if __name__== "__main__":
    main()

Answer 1

For me the problem comes from the size of your network, you have only one Conv2D with a filter size of 10. This is way too small to learn the deep reprensation of your image.

Try to increment this a lot by using blocks of common architectures like VGGnet !
Example of a block :

x = Conv2D(32, (3, 3) , padding='SAME')(model_input)
x = LeakyReLU(alpha=0.3)(x)
x = BatchNormalization()(x)
x = Conv2D(32, (3, 3) , padding='SAME')(x)
x = LeakyReLU(alpha=0.3)(x)
x = BatchNormalization()(x)
x = MaxPooling2D(pool_size=(2, 2))(x)
x = Dropout(0.25)(x)

You need to try multiple blocks like that, and increasing the filter size in order to capture deeper features.

Other thing, you don't need to specify the input_dim of your dense layer, keras automaticly take care of that !

Last but not least, you need to fully connected network in oder to correctly classify your images, not only a single layer.

For example :

x = Flatten()(x)
x = Dense(256)(x)
x = LeakyReLU(alpha=0.3)(x)
x = Dense(128)(x)
x = LeakyReLU(alpha=0.3)(x)
x = Dense(2)(x)
x = Activation('softmax')(x)

Try those changes and keep me in touch !

Update after op's questions

Images are complex, they contain much information like shapes, edges, colors, etc

In order to capture the maximum amont of information you need to passes through multiple convolutions which will learn the different aspects of the image. Imagine that like for example first convolution will learn to recognise a square, the second conv to recognise circles, the third to recognise edges, etc ..

And for my second point, the final fully connected acts like a classifier, the conv network will output a vector that "represents" a dog or a cat, now you need to learn that this kind of vector is one class or the other one.
And directly feeding that vector in the final layer is not enough to learn this representation.

Is that more clear ?

Last update for op's second comment

Here the two ways for defining a Keras model, both output the same thing !

model_input = Input(shape=(200, 1))
x = Dense(32)(model_input)
x = Dense(16)(x)
x = Activation('relu')(x)
model = Model(inputs=model_input, outputs=x)




model = Sequential()
model.add(Dense(32, input_shape=(200, 1)))
model.add(Dense(16, activation = 'relu'))

Example of architecure

model = Sequential()
model.add(keras.layers.InputLayer(input_shape=input_shape))
model.add(keras.layers.convolutional.Conv2D(32, (3,3), strides=(2, 2), padding='same', activation='relu'))
model.add(keras.layers.convolutional.Conv2D(32, (3,3), strides=(2, 2), padding='same', activation='relu'))
model.add(keras.layers.MaxPooling2D(pool_size=(2, 2)))

model.add(keras.layers.convolutional.Conv2D(64, (3,3), strides=(2, 2), padding='same', activation='relu'))
model.add(keras.layers.convolutional.Conv2D(64, (3,3), strides=(2, 2), padding='same', activation='relu'))
model.add(keras.layers.MaxPooling2D(pool_size=(2, 2)))

model.add(keras.layers.Flatten())

model.add(keras.layers.Dense(128, activation='relu'))
model.add(keras.layers.Dense(2, activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

Don't forget to normalize your data before feeding into your network.

A simple images_full = images_full / 255.0 on your data can boost your accuracy a lot.
Try it with grayscale images too, it's more computaly efficient.

CNN accuracy on binary classification of cat/dog images no better than random

Question

1 answers

solution1
2 ACCPTED 2019-07-10 20:30:11

Update after op's questions

Last update for op's second comment

Example of architecure

CNN accuracy on binary classification of cat/dog images no better than random

Question

1 answers

solution1 2 ACCPTED 2019-07-10 20:30:11

Update after op's questions

Last update for op's second comment

Example of architecure

solution1
2 ACCPTED 2019-07-10 20:30:11