简体   繁体   中英

Keras model gets constant loss and accuracy

I am trying to train a keras CNN against the Street View House Numbers Dataset. You can find the project here . The problem is that during training neither loss nor accuracy change over time. I have tried with 1 Channel (Gray Scale) images, with RGB (3 channels) images, with wider (50,50) and smaller (28,28) images, with more or less filters in the convolutional layers, with wider and smaller patches in the pooling layers, with and without dropout, with bigger and smaller batches, with smaller and bigger learning step for the optimizers, with different optimizers, ...

Still the training gets stuck to constant loss and accuracy

Here is how I prepared the data

from PIL import Image
from PIL import ImageFilter
train_folders = 'sv_train/train'
test_folders = 'test'
extra_folders = 'extra'
SV_IMG_SIZE = 28
SV_CHANNELS = 3
train_imsize = np.ndarray([len(train_data),2])
k = 500
sv_images = []
max_images = 20000#len(train_data)
max_digits = 5
sv_labels = np.ones([max_images, max_digits], dtype=int) * 10 # init to 10 cause it would be no digit
nboxes = [[] for i in range(max_images)]
print ("%d to load" % len(train_data))
def getBBox(i,perc):

    boxes = train_data[i]['boxes'] 
    x_min=9990
    y_min=9990
    x_max=0
    y_max=0
    for bid,b in enumerate(boxes):
        x_min = b['left'] if b['left'] <= x_min else x_min
        y_min = b['top'] if b['top'] <= y_min else y_min
        x_max = b['left']+b['width'] if  b['left']+b['width'] >= x_max else x_max
        y_max = b['top']+b['height'] if b['top']+b['height'] >= y_max else y_max

    dy = y_max-y_min
    dx = x_max-x_min
    dpy = dy*perc
    dpx = dx*perc
    nboxes[i]=[dpx,dpy,dx,dy]
    return x_min-dpx, y_min-dpy, x_max+dpx, y_max+dpy

for i in range(max_images):
    print (" \r%d" % i ,end="")
    filename = train_data[i]['filename']
    fullname = os.path.join(train_folders, filename)
    boxes = train_data[i]['boxes']
    label = [10,10,10,10,10]
    lb = len(boxes)
    if lb <= max_digits:
        im = Image.open(fullname)
        x_min, y_min, x_max, y_max = getBBox(i,0.3)
        im = im.crop([x_min,y_min,x_max,y_max])
        owidth, oheight = im.size
        wr = SV_IMG_SIZE/float(owidth)
        hr = SV_IMG_SIZE/float(oheight)
        for bid,box in  enumerate(boxes):
            sv_labels[i][max_digits-lb+bid] = int(box['label'])

        box = nboxes[i]
        box[0]*=wr
        box[1]*=wr
        box[2]*=hr
        box[3]*=hr
        im = im.resize((SV_IMG_SIZE,SV_IMG_SIZE),Image.ANTIALIAS)
        array = np.asarray(im)
        array =  array.reshape((SV_IMG_SIZE,SV_IMG_SIZE,SV_CHANNELS)).astype(np.float32)
        na = np.zeros([SV_IMG_SIZE,SV_IMG_SIZE,SV_CHANNELS],dtype=int)
        sv_images.append(array.astype(np.float32))

Here is the model

from keras.optimizers import Adam
from keras.utils.np_utils import to_categorical

adam = Adam(lr=0.5)

model = Sequential()
x = Input((SV_IMG_SIZE, SV_IMG_SIZE,SV_CHANNELS))

y = Convolution2D(16, 3, 3, activation='relu', border_mode='same')(x)
y = Convolution2D(32, 3, 3, activation='relu', border_mode='valid')(y)
y = MaxPooling2D((2, 2))(y)
y = Convolution2D(128, 3, 3, activation='relu', border_mode='valid')(y)
y = MaxPooling2D((2, 2))(y)
y = Flatten()(y)
y = Dense(512, activation='relu')(y)


digit1 = Dense(11, activation="softmax")(y)
digit2 = Dense(11, activation="softmax")(y)
digit3 = Dense(11, activation="softmax")(y)
digit4 = Dense(11, activation="softmax")(y)
digit5 = Dense(11, activation="softmax")(y)
model = Model(input=x, output=[digit1, digit2, digit3,digit4,digit5])


model.compile(optimizer=adam,
          loss='categorical_crossentropy',
          metrics=['accuracy'])


sv_train_labels = [to_categorical(svt_labels[:,0]),
                   to_categorical(svt_labels[:,1]),
                   to_categorical(svt_labels[:,2]),
                   to_categorical(svt_labels[:,3]),
                   to_categorical(svt_labels[:,4])]
sv_validation_labels = [to_categorical(svv_labels[:,0]),
                        to_categorical(svv_labels[:,1]),
                        to_categorical(svv_labels[:,2]),
                        to_categorical(svv_labels[:,3]),
                        to_categorical(svv_labels[:,4])]

model.fit(sv_train, sv_train_labels, nb_epoch=50, batch_size=8,validation_data=(sv_validation, sv_validation_labels))

As my comment above, I'd suggest to avoid training a model to predict 5 digits combination. It would be far more efficient to train the model to predict a single number. I tried to build quick example based on Keras example cifar10_cnn.py on MNIST SHVN format 2 (cropped digits) :

import numpy as np
import scipy.io as sio
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.utils.np_utils import to_categorical

# parameters
nb_epoch = 10
batch_size = 32

# load data
nb_classes = 10
train_data = sio.loadmat('train_32x32.mat')
test_data = sio.loadmat('test_32x32.mat')
X_train = train_data['X'].T / 255
X_test = test_data['X'].T / 255
y_train = to_categorical(train_data['y'] % nb_classes)
y_test = to_categorical(test_data['y'] % nb_classes)

# model
model = Sequential()
model.add(Convolution2D(32, 3, 3, border_mode='same', input_shape=X_train.shape[1:]))
model.add(Activation('relu'))
model.add(Convolution2D(32, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Convolution2D(64, 3, 3, border_mode='same'))
model.add(Activation('relu'))
model.add(Convolution2D(64, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])

# train
model.fit(X_train, y_train, batch_size=batch_size, nb_epoch=nb_epoch, validation_data=(X_test, y_test), shuffle=True)

Once you trained the model, train another model to recognize/extract each number from an image using library such as OpenCV

In cases like this it is most of the time a wrong training set. I would recommend you to take a look at the actual images and labels you feed into the network. Additionally look at the actual colorbar of the images. This means seeing how their values are distributed. This often leads to the solution. Anyways, if you are able to map them, then so will the computer given a good learning rate.

why are the labels for 104 : [10 10 1 10 4] ? I believe it should be [10 10 1 0 4] , no?

In my opinion : either you have a problem with the input data (the preparation might be wrong), or you have an architecture not suitable for this problem.

It is training, you can see on the notebook there is a change of loss between epoch 1 and 2. So it's not a training issue.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM