分类 model 产生极低的测试准确度，尽管训练和验证准确度对多类分类有好处

Question

我正在尝试对美国手语进行字母分类。 所以这是一个有 26 个类的多类分类任务。 我的 CNN model 提供了 84% 的训练准确率和 91% 的验证准确率，但测试准确率非常低 - 只有 7.7% ！！！

我使用ImageDataGenerator生成训练和验证数据：

datagen = ImageDataGenerator(
        rescale=1./255,
        rotation_range=0.2,
        width_shift_range=0.05,
        height_shift_range=0.05,
        shear_range=0.05,
        horizontal_flip=True,
        fill_mode='nearest',
        validation_split=0.2)

img_height = img_width = 256

batch_size = 16 
source = '/home/hp/asl_detection/train'

train_generator = datagen.flow_from_directory(
    source,
    target_size=(img_height, img_width),
    batch_size=batch_size,
    shuffle=True,
    class_mode='categorical',
    subset='training', # set as training data
    color_mode='grayscale',
    seed=42,
    )

validation_generator = datagen.flow_from_directory(
    source,
    target_size=(img_height, img_width),
    batch_size=batch_size,
    shuffle=True,
    class_mode='categorical',
    subset='validation', # set as validation data
    color_mode='grayscale',
    seed=42,
    )

这是我的 model 代码：

img_rows = 256
img_cols = 256

def get_net():

    inputs = Input((img_rows, img_cols, 1))
    print("inputs shape:",inputs.shape)

    #Convolution layers
    conv1 = Conv2D(24, 3, strides=(2, 2), activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(inputs)
    print("conv1 shape:",conv1.shape)
      
    conv2 = Conv2D(24, 3, strides=(2, 2), activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv1)
    print("conv2 shape:",conv2.shape)
    
    pool1 = MaxPooling2D(pool_size=(2, 2))(conv2)
    print("pool1 shape:",pool1.shape)
    
    drop1 = Dropout(0.25)(pool1)

    conv3 = Conv2D(36, 3, strides=(2, 2), activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(drop1)
    print("conv3 shape:",conv3.shape)

    conv4 = Conv2D(36, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv3)
    print("conv4 shape:",conv4.shape)
    
    pool2 = MaxPooling2D(pool_size=(2, 2))(conv4)
    print("pool2 shape:",pool2.shape)
    
    drop2 = Dropout(0.25)(pool2)

    conv5 = Conv2D(48, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(drop2)
    print("conv5 shape:",conv5.shape)
    
    conv6 = Conv2D(48, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv5)
    print("conv6 shape:",conv6.shape)
    
    pool3 = MaxPooling2D(pool_size=(2, 2))(conv6)
    print("pool3 shape:",pool3.shape)
    
    drop3 = Dropout(0.25)(pool3)

    #Flattening
    flat = Flatten()(drop3)

    #Fully connected layers
    dense1 = Dense(128, activation = 'relu', use_bias=True, kernel_initializer = 'he_normal')(flat)
    print("dense1 shape:",dense1.shape)
    drop4 = Dropout(0.5)(dense1)

    dense2 = Dense(128, activation = 'relu', use_bias=True, kernel_initializer = 'he_normal')(drop4)
    print("dense2 shape:",dense2.shape)
    drop5 = Dropout(0.5)(dense2)

    dense4 = Dense(26, activation = 'softmax', use_bias=True, kernel_initializer = 'he_normal')(drop5)
    print("dense4 shape:",dense4.shape)
            
    model = Model(input = inputs, output = dense4)

    optimizer = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=0.00000001, decay=0.0)

    model.compile(optimizer = optimizer, loss = 'categorical_crossentropy', metrics = ['accuracy'])

    return model

这是训练代码：

def train():
    
    model = get_net()
    print("got model")
    model.summary()

    model_checkpoint = ModelCheckpoint('seqnet.hdf5', monitor='loss',verbose=1, save_best_only=True)
    print('Fitting model...')
    
    history = model.fit_generator(
    train_generator,
    steps_per_epoch = train_generator.samples // batch_size,
    validation_data = validation_generator, 
    validation_steps = validation_generator.samples // batch_size,
    epochs = 100)
    
    # list all data in history
    print(history.history.keys())
    # summarize history for accuracy
    plt.plot(history.history['acc'])
    plt.plot(history.history['val_acc'])
    plt.title('model accuracy')
    plt.ylabel('accuracy')
    plt.xlabel('epoch')
    plt.legend(['train', 'validation'], loc='upper left')
    plt.show()
    # summarize history for loss
    plt.plot(history.history['loss'])
    plt.plot(history.history['val_loss'])
    plt.title('model loss')
    plt.ylabel('loss')
    plt.xlabel('epoch')
    plt.legend(['train', 'validation'], loc='upper left')
    plt.show() 
    
    
    return model

model = train()

这是最近几个时期的训练日志：

Epoch 95/100
72/72 [==============================] - 74s 1s/step - loss: 0.4326 - acc: 0.8523 - val_loss: 0.2198 - val_acc: 0.9118
Epoch 96/100
72/72 [==============================] - 89s 1s/step - loss: 0.4591 - acc: 0.8418 - val_loss: 0.1944 - val_acc: 0.9412
Epoch 97/100
72/72 [==============================] - 90s 1s/step - loss: 0.4387 - acc: 0.8533 - val_loss: 0.2802 - val_acc: 0.8971
Epoch 98/100
72/72 [==============================] - 106s 1s/step - loss: 0.4680 - acc: 0.8349 - val_loss: 0.2206 - val_acc: 0.9228
Epoch 99/100
72/72 [==============================] - 85s 1s/step - loss: 0.4459 - acc: 0.8427 - val_loss: 0.2861 - val_acc: 0.9081
Epoch 100/100
72/72 [==============================] - 74s 1s/step - loss: 0.4639 - acc: 0.8472 - val_loss: 0.2866 - val_acc: 0.9191
dict_keys(['val_loss', 'loss', 'acc', 'val_acc'])

这些是 model 精度和损耗的曲线：

与训练和验证数据不同，我没有使用ImageDataGenerator来准备测试数据。 对于测试数据，我使用OpenCV将图像转换为灰度，进一步进行了标准化。 在同一个循环中，我生成了图像的相应 label 以防止任何顺序不匹配。 我将图像文件名和标签保存在 csv 文件中。 这是代码：

source = '/home/hp/asl_detection/test/unknown'
files = os.listdir(source)
test_data = []
rows = []
for file in files:
    
    row = []
    row.append(file)
    row.append(file[6])
    print(file)
    row.append(ord(file[6]) - 97)  
    rows.append(row) 
    
    img = cv2.imread(os.path.join(source, file))
    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    img = cv2.resize(img,(256, 256))
    test_data.append(img)
    
test_data = np.array(test_data, dtype="float") / 255.0
print(test_data)
print(test_data.shape)

with open("/home/hp/asl_detection/test/alpha_class.csv", "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerows(rows)

以下是 csv 的几个元组：

此外，我重塑了测试图像阵列以提供通道信息：

test_data = test_data.reshape((test_data.shape[0], img_rows, img_cols, 1))

最后通过从 csv 中获取标签来预测类别并计算测试数据的准确性：

y_proba = model.predict(test_data)
y_classes = y_proba.argmax(axis=-1)
data = pd.read_csv('/home/hp/asl_detection/test/alpha_class.csv', header=None)
original_classes = data.iloc[:, 2]
original_classes = original_classes.tolist()
y_classes = y_classes.tolist()
acc = accuracy_score(original_classes, y_classes) * 100

您能找出测试准确率如此低的原因吗？ 如果需要进一步的信息，请告诉我。

Answer 1

我认为你正面临一个过度拟合的问题，验证集误导了你。 为了使验证不被误导，它必须具有相同的测试集分布，因此尝试从相同的分布生成测试集和验证集，也不要对验证数据集进行数据扩充。

分类 model 产生极低的测试准确度，尽管训练和验证准确度对多类分类有好处

问题描述

1 个解决方案

解决方案1
0 2020-07-08 22:53:57

分类 model 产生极低的测试准确度，尽管训练和验证准确度对多类分类有好处

问题描述

1 个解决方案

解决方案1 0 2020-07-08 22:53:57

解决方案1
0 2020-07-08 22:53:57