简体   繁体   English

Keras,从DataFrameIterator中获取对应标签的numpy数组

[英]Keras, get numpy array of corresponding labels from DataFrameIterator

I've this issue with Keras, My test_set is defined as follows:我有这个问题 Keras,我的 test_set 定义如下:

My test set我的测试集

test_datagen = ImageDataGenerator(rescale = 1./255)
test_set = test_datagen.flow_from_dataframe(dataframe=X_test,
                                            x_col='image_path',
                                            y_col='category_id',
                                            #imagepath_test,
                                            target_size = (64, 64),
                                            batch_size = 32,
                                            class_mode = 'categorical')

To resume shortly, I use ImageDataGenerator from Keras, and then I assign the datagenerator to my test_set.为了尽快恢复,我使用 Keras 中的ImageDataGenerator ,然后将数据生成器分配给我的测试集。

My dataframe X_test has 2 columns,我的 dataframe X_test有 2 列,

x_col='image_path' ## The path to my image files
y_col='category_id' ## My categorical features - Labels 

I need to extract the values in y_col from my test_set, because,我需要从我的 test_set 中提取 y_col 中的值,因为,

test_set['category_id'] is not in same order neither same shape as X_test['category_id'] 
X_test['category_id'].shape
(315,)

When I look at test_set type i get:当我查看 test_set 类型时,我得到:

type(test_set)
keras_preprocessing.image.dataframe_iterator.DataFrameIterator

The reason:原因:

When we make predictions we need to predict on this "test_set"当我们做出预测时,我们需要预测这个“test_set”

#_________________________________
# Making Predictions
y_preds = classifier.predict(test_set)

So, When I want to display my Classification report I can't use "test_set" because of the wrong format and I can't use my X_test['categorical_id'] because the true values are not in the same order as test_set.所以,当我想显示我的分类报告时,我不能使用"test_set" ,因为格式错误,我不能使用我的X_test['categorical_id'] ,因为真实值与test_set.

Below an example  of classification_report with test_set and the result:

print(classification_report(test_set, predicted, 
  target_names=df_data['product_cat1'].unique()))

As result I get an error:结果我得到一个错误:

ValueError: Found input variables with inconsistent numbers of samples: [10, 315]

Remember, my 'X_test' shape is:请记住,我'X_test'形状是:

 X_test['category_id'].shape
    (315,)

Whatever I've tried to convert this 'test_set' into array or dataframe didn't work.无论我尝试将此'test_set'转换为数组还是 dataframe 都没有用。

If I use X_test['category_id'] in my classifiction_report, it works but the scores are fake,如果我在我的 classifiction_report 中使用X_test['category_id'] ,它可以工作但分数是假的,

Otherwise, Multiclass Classification using Keras is funny but useless, if we can't identify precision and recall score for each class and f1_score accuracy model, I mean we just get a global accuracy model and that's all, only good to competitions.否则,使用 Keras 的多类分类很有趣但毫无用处,如果我们无法确定每个 class 和 f1_score 准确度 model 的精度和召回分数,我的意思是我们只能获得全局准确度 model,仅此而已,仅对比赛有益。

Any ideas, workarounds are welcome.欢迎任何想法,解决方法。

Since it is test_set and since you do not train on test_set you do not have to shuffle it so that the order is preserved.由于它是test_set并且由于您不在test_set上进行训练,因此您不必对其进行洗牌以保留顺序。 That way you will know the order of labels (ground truth) from X_test['category_id'] and use the same for the classification_report这样你就可以从X_test['category_id']中知道标签的顺序(ground truth)并将其用于classification_report

Fix使固定

test_set = test_datagen.flow_from_dataframe(dataframe=X_test,
                                            x_col='image_path',
                                            y_col='category_id',
                                            shuffle=False,        ### Do not shuffle
                                            target_size = (64, 64),
                                            batch_size = 32,
                                            class_mode = 'categorical')

y_preds = classifier.predict(test_set)
print(classification_report(test_set, X_test['category_id']))

In case you want to shuffle the test_set then you can seed it with a value and make the prediction.如果你想test_set那么你可以用一个值来播种它并做出预测。 Then use the same seed and iterate over the datagenerator and collect the labels (ground truth).然后使用相同的种子并迭代数据生成器并收集标签(基本事实)。 With the same seed value set you will get the same order.使用相同的种子值集,您将获得相同的订单。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM