训练和验证分数很高，但测试准确性很差

Question

I am working on multi-label image classification, i am using inception net as my base architecture. 我正在从事多标签图像分类，我正在使用Inception Net作为我的基本体系结构。 after the complete training i am getting, training accuracy > 90% and validation accuracy > 85% but i am getting 17% accuracy on test data. 经过完整的培训后，我的培训准确性> 90％，验证准确性> 85％，但是我在测试数据上的准确性达到17％。

Model training --> 模型训练->

model = Model(pre_trained_model.input, x)
model.compile(loss='categorical_crossentropy',
              optimizer=RMSprop(lr=0.0001),#'adam'
              metrics=['acc'])
    history = model.fit_generator(
      train_generator,
      steps_per_epoch=600,#total data/batch size
      epochs=100,
      validation_data=validation_generator,
      validation_steps=20,
      verbose=1,callbacks = callbacks)

Testing on the trained model: 在经过训练的模型上进行测试：

test_generator = test_datagen.flow_from_directory(
    test_dir,target_size=(128, 128),batch_size=1,class_mode='categorical')

filenames = test_generator.filenames
nb_samples = len(filenames)

prediction = test_model.predict_generator(test_generator,steps=nb_samples,verbose=1)

Saving the results to Pandas 将结果保存到熊猫

predicted_class_indices = np.argmax(prediction,axis=1)
labels = (train_generator.class_indices) #geting names of classes from folder structure
labels = dict((v,k) for k,v in labels.items())
predictions = [k for k in predicted_class_indices]

results=pd.DataFrame({"image_name":filenames,
                      "label":predictions})
results['image_name'] = [each.split("\\")[-1] for each in results['image_name']]

Everything looks fine but still i am getting very poor prediction. 一切看起来都不错，但我的预测仍然很差。 kidly help me to fugure out, where i am making the mistakes. 小子帮我弄错了，我在哪里犯错。

Answer 1

It can be the case that the images in your dataset are arranged in such a way that test images are previously unseen by the model and so the accuracy drops significantly. 可能的情况是，数据集中的图像的排列方式使得模型以前看不到测试图像，因此准确性大大降低。

What I recommend is for you to try to use K-fold cross validation or even Stratified K-fold cross validation . 我建议您尝试使用K折交叉验证，甚至使用分层K折交叉验证。 The benefit here is that your dataset will be splitted in, let's say 10 'batches'. 这样做的好处是您的数据集将被分割成10个“批次”。 Every iteration (out of 10) one batch will be the test batch and all the others will be train batches. 每次迭代（每10个迭代中）一个批次将成为测试批次，其他所有批次将成为训练批次。 The next iteration, test batch from the previous step becomes train batch and some other batch becomes test batch. 下一个迭代，即上一步中的测试批次成为训练批次，而其他一些批次则成为测试批次。 It's important to denote that every batch will be the test batch only once. 重要的是要表示每个批次只能是一次测试批次。 Another benefit of the Stratified K-fold is that it will take into account the class labels and try to split the classes in such way that every batch has approximately the same distribution of classes. 分层K折的另一个好处是，它将考虑类标签，并尝试以每个批次具有大致相同的类分布的方式拆分类。

Another way to achieve some better results is to just shuffle the images and pick the training ones and test ones then. 获得更好结果的另一种方法是只对图像进行混洗，然后选择训练图像和测试图像。

训练和验证分数很高，但测试准确性很差

问题描述

1 个解决方案

解决方案1
0 2019-02-15 07:30:59

训练和验证分数很高，但测试准确性很差

问题描述

1 个解决方案

解决方案1 0 2019-02-15 07:30:59

解决方案1
0 2019-02-15 07:30:59