Keras，从DataFrameIterator中获取对应标签的numpy数组

Question

I've this issue with Keras, My test_set is defined as follows:我有这个问题 Keras，我的 test_set 定义如下：

My test set我的测试集

test_datagen = ImageDataGenerator(rescale = 1./255)
test_set = test_datagen.flow_from_dataframe(dataframe=X_test,
                                            x_col='image_path',
                                            y_col='category_id',
                                            #imagepath_test,
                                            target_size = (64, 64),
                                            batch_size = 32,
                                            class_mode = 'categorical')

To resume shortly, I use ImageDataGenerator from Keras, and then I assign the datagenerator to my test_set.为了尽快恢复，我使用 Keras 中的ImageDataGenerator ，然后将数据生成器分配给我的测试集。

My dataframe X_test has 2 columns,我的 dataframe X_test有 2 列，

x_col='image_path' ## The path to my image files
y_col='category_id' ## My categorical features - Labels

I need to extract the values in y_col from my test_set, because,我需要从我的 test_set 中提取 y_col 中的值，因为，

test_set['category_id'] is not in same order neither same shape as X_test['category_id'] 
X_test['category_id'].shape
(315,)

When I look at test_set type i get:当我查看 test_set 类型时，我得到：

type(test_set)
keras_preprocessing.image.dataframe_iterator.DataFrameIterator

The reason:原因：

When we make predictions we need to predict on this "test_set"当我们做出预测时，我们需要预测这个“test_set”

#_________________________________
# Making Predictions
y_preds = classifier.predict(test_set)

So, When I want to display my Classification report I can't use "test_set" because of the wrong format and I can't use my X_test['categorical_id'] because the true values are not in the same order as test_set.所以，当我想显示我的分类报告时，我不能使用"test_set" ，因为格式错误，我不能使用我的X_test['categorical_id'] ，因为真实值与test_set.

Below an example  of classification_report with test_set and the result:

print(classification_report(test_set, predicted, 
  target_names=df_data['product_cat1'].unique()))

As result I get an error:结果我得到一个错误：

ValueError: Found input variables with inconsistent numbers of samples: [10, 315]

Remember, my 'X_test' shape is:请记住，我'X_test'形状是：

 X_test['category_id'].shape
    (315,)

Whatever I've tried to convert this 'test_set' into array or dataframe didn't work.无论我尝试将此'test_set'转换为数组还是 dataframe 都没有用。

If I use X_test['category_id'] in my classifiction_report, it works but the scores are fake,如果我在我的 classifiction_report 中使用X_test['category_id'] ，它可以工作但分数是假的，

Otherwise, Multiclass Classification using Keras is funny but useless, if we can't identify precision and recall score for each class and f1_score accuracy model, I mean we just get a global accuracy model and that's all, only good to competitions.否则，使用 Keras 的多类分类很有趣但毫无用处，如果我们无法确定每个 class 和 f1_score 准确度 model 的精度和召回分数，我的意思是我们只能获得全局准确度 model，仅此而已，仅对比赛有益。

Any ideas, workarounds are welcome.欢迎任何想法，解决方法。

Answer 1

Since it is test_set and since you do not train on test_set you do not have to shuffle it so that the order is preserved.由于它是test_set并且由于您不在test_set上进行训练，因此您不必对其进行洗牌以保留顺序。 That way you will know the order of labels (ground truth) from X_test['category_id'] and use the same for the classification_report这样你就可以从X_test['category_id']中知道标签的顺序（ground truth）并将其用于classification_report

Fix使固定

test_set = test_datagen.flow_from_dataframe(dataframe=X_test,
                                            x_col='image_path',
                                            y_col='category_id',
                                            shuffle=False,        ### Do not shuffle
                                            target_size = (64, 64),
                                            batch_size = 32,
                                            class_mode = 'categorical')

y_preds = classifier.predict(test_set)
print(classification_report(test_set, X_test['category_id']))

In case you want to shuffle the test_set then you can seed it with a value and make the prediction.如果你想test_set那么你可以用一个值来播种它并做出预测。 Then use the same seed and iterate over the datagenerator and collect the labels (ground truth).然后使用相同的种子并迭代数据生成器并收集标签（基本事实）。 With the same seed value set you will get the same order.使用相同的种子值集，您将获得相同的订单。

Keras，从DataFrameIterator中获取对应标签的numpy数组

问题描述

My test set我的测试集

1 个解决方案

解决方案1
1 已采纳 2020-07-29 18:15:47

Fix使固定

Keras，从DataFrameIterator中获取对应标签的numpy数组

问题描述

My test set我的测试集

1 个解决方案

解决方案1 1 已采纳 2020-07-29 18:15:47

Fix使固定

解决方案1
1 已采纳 2020-07-29 18:15:47