简体   繁体   English

Keras模型获得恒定的损失和准确性

[英]Keras model gets constant loss and accuracy

I am trying to train a keras CNN against the Street View House Numbers Dataset. 我正在尝试针对街景房数数据集训练keras CNN。 You can find the project here . 您可以在此处找到该项目。 The problem is that during training neither loss nor accuracy change over time. 问题在于,在训练过程中,损失和准确性均不会随时间变化。 I have tried with 1 Channel (Gray Scale) images, with RGB (3 channels) images, with wider (50,50) and smaller (28,28) images, with more or less filters in the convolutional layers, with wider and smaller patches in the pooling layers, with and without dropout, with bigger and smaller batches, with smaller and bigger learning step for the optimizers, with different optimizers, ... 我尝试使用1通道(灰度)图像,RGB(3通道)图像,较宽(50,50)和较小(28,28)图像,在卷积层中使用或多或少的滤镜,以及越来越大池层中的补丁,有和没有辍学,批次越来越小,优化程序的学习步骤越来越大,使用不同的优化程序,...

Still the training gets stuck to constant loss and accuracy 培训仍然停留在不断的损失和准确性上

Here is how I prepared the data 这是我准备数据的方式

from PIL import Image
from PIL import ImageFilter
train_folders = 'sv_train/train'
test_folders = 'test'
extra_folders = 'extra'
SV_IMG_SIZE = 28
SV_CHANNELS = 3
train_imsize = np.ndarray([len(train_data),2])
k = 500
sv_images = []
max_images = 20000#len(train_data)
max_digits = 5
sv_labels = np.ones([max_images, max_digits], dtype=int) * 10 # init to 10 cause it would be no digit
nboxes = [[] for i in range(max_images)]
print ("%d to load" % len(train_data))
def getBBox(i,perc):

    boxes = train_data[i]['boxes'] 
    x_min=9990
    y_min=9990
    x_max=0
    y_max=0
    for bid,b in enumerate(boxes):
        x_min = b['left'] if b['left'] <= x_min else x_min
        y_min = b['top'] if b['top'] <= y_min else y_min
        x_max = b['left']+b['width'] if  b['left']+b['width'] >= x_max else x_max
        y_max = b['top']+b['height'] if b['top']+b['height'] >= y_max else y_max

    dy = y_max-y_min
    dx = x_max-x_min
    dpy = dy*perc
    dpx = dx*perc
    nboxes[i]=[dpx,dpy,dx,dy]
    return x_min-dpx, y_min-dpy, x_max+dpx, y_max+dpy

for i in range(max_images):
    print (" \r%d" % i ,end="")
    filename = train_data[i]['filename']
    fullname = os.path.join(train_folders, filename)
    boxes = train_data[i]['boxes']
    label = [10,10,10,10,10]
    lb = len(boxes)
    if lb <= max_digits:
        im = Image.open(fullname)
        x_min, y_min, x_max, y_max = getBBox(i,0.3)
        im = im.crop([x_min,y_min,x_max,y_max])
        owidth, oheight = im.size
        wr = SV_IMG_SIZE/float(owidth)
        hr = SV_IMG_SIZE/float(oheight)
        for bid,box in  enumerate(boxes):
            sv_labels[i][max_digits-lb+bid] = int(box['label'])

        box = nboxes[i]
        box[0]*=wr
        box[1]*=wr
        box[2]*=hr
        box[3]*=hr
        im = im.resize((SV_IMG_SIZE,SV_IMG_SIZE),Image.ANTIALIAS)
        array = np.asarray(im)
        array =  array.reshape((SV_IMG_SIZE,SV_IMG_SIZE,SV_CHANNELS)).astype(np.float32)
        na = np.zeros([SV_IMG_SIZE,SV_IMG_SIZE,SV_CHANNELS],dtype=int)
        sv_images.append(array.astype(np.float32))

Here is the model 这是模型

from keras.optimizers import Adam
from keras.utils.np_utils import to_categorical

adam = Adam(lr=0.5)

model = Sequential()
x = Input((SV_IMG_SIZE, SV_IMG_SIZE,SV_CHANNELS))

y = Convolution2D(16, 3, 3, activation='relu', border_mode='same')(x)
y = Convolution2D(32, 3, 3, activation='relu', border_mode='valid')(y)
y = MaxPooling2D((2, 2))(y)
y = Convolution2D(128, 3, 3, activation='relu', border_mode='valid')(y)
y = MaxPooling2D((2, 2))(y)
y = Flatten()(y)
y = Dense(512, activation='relu')(y)


digit1 = Dense(11, activation="softmax")(y)
digit2 = Dense(11, activation="softmax")(y)
digit3 = Dense(11, activation="softmax")(y)
digit4 = Dense(11, activation="softmax")(y)
digit5 = Dense(11, activation="softmax")(y)
model = Model(input=x, output=[digit1, digit2, digit3,digit4,digit5])


model.compile(optimizer=adam,
          loss='categorical_crossentropy',
          metrics=['accuracy'])


sv_train_labels = [to_categorical(svt_labels[:,0]),
                   to_categorical(svt_labels[:,1]),
                   to_categorical(svt_labels[:,2]),
                   to_categorical(svt_labels[:,3]),
                   to_categorical(svt_labels[:,4])]
sv_validation_labels = [to_categorical(svv_labels[:,0]),
                        to_categorical(svv_labels[:,1]),
                        to_categorical(svv_labels[:,2]),
                        to_categorical(svv_labels[:,3]),
                        to_categorical(svv_labels[:,4])]

model.fit(sv_train, sv_train_labels, nb_epoch=50, batch_size=8,validation_data=(sv_validation, sv_validation_labels))

As my comment above, I'd suggest to avoid training a model to predict 5 digits combination. 正如我上面的评论,我建议避免训练模型来预测5位数字的组合。 It would be far more efficient to train the model to predict a single number. 训练模型以预测单个数字会更加有效。 I tried to build quick example based on Keras example cifar10_cnn.py on MNIST SHVN format 2 (cropped digits) : 我尝试根据MNIST SHVN格式2(裁剪的数字)上的Keras示例cifar10_cnn.py构建快速示例:

import numpy as np
import scipy.io as sio
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.utils.np_utils import to_categorical

# parameters
nb_epoch = 10
batch_size = 32

# load data
nb_classes = 10
train_data = sio.loadmat('train_32x32.mat')
test_data = sio.loadmat('test_32x32.mat')
X_train = train_data['X'].T / 255
X_test = test_data['X'].T / 255
y_train = to_categorical(train_data['y'] % nb_classes)
y_test = to_categorical(test_data['y'] % nb_classes)

# model
model = Sequential()
model.add(Convolution2D(32, 3, 3, border_mode='same', input_shape=X_train.shape[1:]))
model.add(Activation('relu'))
model.add(Convolution2D(32, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Convolution2D(64, 3, 3, border_mode='same'))
model.add(Activation('relu'))
model.add(Convolution2D(64, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])

# train
model.fit(X_train, y_train, batch_size=batch_size, nb_epoch=nb_epoch, validation_data=(X_test, y_test), shuffle=True)

Once you trained the model, train another model to recognize/extract each number from an image using library such as OpenCV 训练模型后,使用OpenCV等库训练另一个模型以识别/提取图像中的每个数字

In cases like this it is most of the time a wrong training set. 在这种情况下,多数情况下是错误的训练设置。 I would recommend you to take a look at the actual images and labels you feed into the network. 我建议您看一下馈入网络的实际图像和标签。 Additionally look at the actual colorbar of the images. 另外,查看图像的实际颜色栏。 This means seeing how their values are distributed. 这意味着查看其值如何分布。 This often leads to the solution. 这通常导致解决方案。 Anyways, if you are able to map them, then so will the computer given a good learning rate. 无论如何,如果您能够映射它们,则计算机将获得良好的学习率。

why are the labels for 104 : [10 10 1 10 4] ? 为什么104的标签是[10 10 1 10 4] I believe it should be [10 10 1 0 4] , no? 我相信应该是[10 10 1 0 4] ,不是吗?

In my opinion : either you have a problem with the input data (the preparation might be wrong), or you have an architecture not suitable for this problem. 我认为:输入数据有问题(准备工作可能有误),或者体系结构不适合此问题。

It is training, you can see on the notebook there is a change of loss between epoch 1 and 2. So it's not a training issue. 这是训练,您可以在笔记本上看到第1和第2阶段之间的损耗发生了变化。因此,这不是训练问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM