分类 model 产生极低的测试准确度，尽管训练和验证准确度对多类分类有好处

Question

I'm trying to do alphabet classification for American Sign Language.我正在尝试对美国手语进行字母分类。 So it's multiclass classification task with 26 classes.所以这是一个有 26 个类的多类分类任务。 My CNN model gave 84% training accuracy and 91% validation accuracy, yet test accuracy is hilariously low - only 7.7% !!!我的 CNN model 提供了 84% 的训练准确率和 91% 的验证准确率，但测试准确率非常低 - 只有 7.7% ！！！

I used ImageDataGenerator to produce training and validation data:我使用ImageDataGenerator生成训练和验证数据：

datagen = ImageDataGenerator(
        rescale=1./255,
        rotation_range=0.2,
        width_shift_range=0.05,
        height_shift_range=0.05,
        shear_range=0.05,
        horizontal_flip=True,
        fill_mode='nearest',
        validation_split=0.2)

img_height = img_width = 256

batch_size = 16 
source = '/home/hp/asl_detection/train'

train_generator = datagen.flow_from_directory(
    source,
    target_size=(img_height, img_width),
    batch_size=batch_size,
    shuffle=True,
    class_mode='categorical',
    subset='training', # set as training data
    color_mode='grayscale',
    seed=42,
    )

validation_generator = datagen.flow_from_directory(
    source,
    target_size=(img_height, img_width),
    batch_size=batch_size,
    shuffle=True,
    class_mode='categorical',
    subset='validation', # set as validation data
    color_mode='grayscale',
    seed=42,
    )

This is my model code:这是我的 model 代码：

img_rows = 256
img_cols = 256

def get_net():

    inputs = Input((img_rows, img_cols, 1))
    print("inputs shape:",inputs.shape)

    #Convolution layers
    conv1 = Conv2D(24, 3, strides=(2, 2), activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(inputs)
    print("conv1 shape:",conv1.shape)
      
    conv2 = Conv2D(24, 3, strides=(2, 2), activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv1)
    print("conv2 shape:",conv2.shape)
    
    pool1 = MaxPooling2D(pool_size=(2, 2))(conv2)
    print("pool1 shape:",pool1.shape)
    
    drop1 = Dropout(0.25)(pool1)

    conv3 = Conv2D(36, 3, strides=(2, 2), activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(drop1)
    print("conv3 shape:",conv3.shape)

    conv4 = Conv2D(36, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv3)
    print("conv4 shape:",conv4.shape)
    
    pool2 = MaxPooling2D(pool_size=(2, 2))(conv4)
    print("pool2 shape:",pool2.shape)
    
    drop2 = Dropout(0.25)(pool2)

    conv5 = Conv2D(48, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(drop2)
    print("conv5 shape:",conv5.shape)
    
    conv6 = Conv2D(48, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv5)
    print("conv6 shape:",conv6.shape)
    
    pool3 = MaxPooling2D(pool_size=(2, 2))(conv6)
    print("pool3 shape:",pool3.shape)
    
    drop3 = Dropout(0.25)(pool3)

    #Flattening
    flat = Flatten()(drop3)

    #Fully connected layers
    dense1 = Dense(128, activation = 'relu', use_bias=True, kernel_initializer = 'he_normal')(flat)
    print("dense1 shape:",dense1.shape)
    drop4 = Dropout(0.5)(dense1)

    dense2 = Dense(128, activation = 'relu', use_bias=True, kernel_initializer = 'he_normal')(drop4)
    print("dense2 shape:",dense2.shape)
    drop5 = Dropout(0.5)(dense2)

    dense4 = Dense(26, activation = 'softmax', use_bias=True, kernel_initializer = 'he_normal')(drop5)
    print("dense4 shape:",dense4.shape)
            
    model = Model(input = inputs, output = dense4)

    optimizer = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=0.00000001, decay=0.0)

    model.compile(optimizer = optimizer, loss = 'categorical_crossentropy', metrics = ['accuracy'])

    return model

This is training code:这是训练代码：

def train():
    
    model = get_net()
    print("got model")
    model.summary()

    model_checkpoint = ModelCheckpoint('seqnet.hdf5', monitor='loss',verbose=1, save_best_only=True)
    print('Fitting model...')
    
    history = model.fit_generator(
    train_generator,
    steps_per_epoch = train_generator.samples // batch_size,
    validation_data = validation_generator, 
    validation_steps = validation_generator.samples // batch_size,
    epochs = 100)
    
    # list all data in history
    print(history.history.keys())
    # summarize history for accuracy
    plt.plot(history.history['acc'])
    plt.plot(history.history['val_acc'])
    plt.title('model accuracy')
    plt.ylabel('accuracy')
    plt.xlabel('epoch')
    plt.legend(['train', 'validation'], loc='upper left')
    plt.show()
    # summarize history for loss
    plt.plot(history.history['loss'])
    plt.plot(history.history['val_loss'])
    plt.title('model loss')
    plt.ylabel('loss')
    plt.xlabel('epoch')
    plt.legend(['train', 'validation'], loc='upper left')
    plt.show() 
    
    
    return model

model = train()

This is training log for last few epochs:这是最近几个时期的训练日志：

Epoch 95/100
72/72 [==============================] - 74s 1s/step - loss: 0.4326 - acc: 0.8523 - val_loss: 0.2198 - val_acc: 0.9118
Epoch 96/100
72/72 [==============================] - 89s 1s/step - loss: 0.4591 - acc: 0.8418 - val_loss: 0.1944 - val_acc: 0.9412
Epoch 97/100
72/72 [==============================] - 90s 1s/step - loss: 0.4387 - acc: 0.8533 - val_loss: 0.2802 - val_acc: 0.8971
Epoch 98/100
72/72 [==============================] - 106s 1s/step - loss: 0.4680 - acc: 0.8349 - val_loss: 0.2206 - val_acc: 0.9228
Epoch 99/100
72/72 [==============================] - 85s 1s/step - loss: 0.4459 - acc: 0.8427 - val_loss: 0.2861 - val_acc: 0.9081
Epoch 100/100
72/72 [==============================] - 74s 1s/step - loss: 0.4639 - acc: 0.8472 - val_loss: 0.2866 - val_acc: 0.9191
dict_keys(['val_loss', 'loss', 'acc', 'val_acc'])

These are the curves for model accuracies and losses:这些是 model 精度和损耗的曲线：

I didn't use ImageDataGenerator to prepare test data, unlike training and validation data.与训练和验证数据不同，我没有使用ImageDataGenerator来准备测试数据。 For test data, I used OpenCV for converting images to grayscale, further I did normalization.对于测试数据，我使用OpenCV将图像转换为灰度，进一步进行了标准化。 In the same loop I generated the corresponding label of the image to prevent any order mismatch.在同一个循环中，我生成了图像的相应 label 以防止任何顺序不匹配。 I saved the image file names and labels in a csv file.我将图像文件名和标签保存在 csv 文件中。 Here's the code:这是代码：

source = '/home/hp/asl_detection/test/unknown'
files = os.listdir(source)
test_data = []
rows = []
for file in files:
    
    row = []
    row.append(file)
    row.append(file[6])
    print(file)
    row.append(ord(file[6]) - 97)  
    rows.append(row) 
    
    img = cv2.imread(os.path.join(source, file))
    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    img = cv2.resize(img,(256, 256))
    test_data.append(img)
    
test_data = np.array(test_data, dtype="float") / 255.0
print(test_data)
print(test_data.shape)

with open("/home/hp/asl_detection/test/alpha_class.csv", "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerows(rows)

Here are few tuples of the csv:以下是 csv 的几个元组：

Further I reshaped the test image array to give channel information:此外，我重塑了测试图像阵列以提供通道信息：

test_data = test_data.reshape((test_data.shape[0], img_rows, img_cols, 1))

Finally predicted classes and calculated accuracy on test data by fetching labels from csv:最后通过从 csv 中获取标签来预测类别并计算测试数据的准确性：

y_proba = model.predict(test_data)
y_classes = y_proba.argmax(axis=-1)
data = pd.read_csv('/home/hp/asl_detection/test/alpha_class.csv', header=None)
original_classes = data.iloc[:, 2]
original_classes = original_classes.tolist()
y_classes = y_classes.tolist()
acc = accuracy_score(original_classes, y_classes) * 100

Could you plz find the reason behind such a low test accuracy?您能找出测试准确率如此低的原因吗？ If any information is needed further, plz let me know.如果需要进一步的信息，请告诉我。

Answer 1

I think you are facing an overfitting problem and the validation set misleads you.我认为你正面临一个过度拟合的问题，验证集误导了你。 For the validation not to be misleading it has to have the same distribution of the test set, so try to generate the test and validation sets from the same distribution, also don't do data augmentation with the validation data set.为了使验证不被误导，它必须具有相同的测试集分布，因此尝试从相同的分布生成测试集和验证集，也不要对验证数据集进行数据扩充。

分类 model 产生极低的测试准确度，尽管训练和验证准确度对多类分类有好处

问题描述

1 个解决方案

解决方案1
0 2020-07-08 22:53:57

分类 model 产生极低的测试准确度，尽管训练和验证准确度对多类分类有好处

问题描述

1 个解决方案

解决方案1 0 2020-07-08 22:53:57

解决方案1
0 2020-07-08 22:53:57