简体   繁体   English

标签大小与target_names不同:Tensorflow多输入回归转换为分类

[英]Labels size different from target_names: Tensorflow Multi-Input Regression converting to Classification

I am trying to convert a multi-input mixed input (txt, image) keras model from a regression output (house price) to a classification output (number of bedrooms). 我正在尝试将多输入混合输入(txt,图像)keras模型从回归输出(房屋价格)转换为分类输出(卧室数量)。 In particular, I am altering this tutorial 特别是,我正在更改本教程

https://www.pyimagesearch.com/2019/02/04/keras-multiple-inputs-and-mixed-data/ https://www.pyimagesearch.com/2019/02/04/keras-multiple-inputs-and-mixed-data/

to be a classifier. 成为分类器。 I have a couple of technical questions about the number of categories, and I also get an error that I don't fully understand. 关于类别的数量,我有几个技术问题,而且我还得到一个我不完全理解的错误。

I have altered the last layer of the network to be a softmax: 我将网络的最后一层更改为softmax:

x = Dense(11, activation="softmax")(x)

However I only have 10 categories (the dataset covers houses with 1-10 bedrooms). 但是我只有10个类别(数据集涵盖1-10个卧室的房屋)。 But with Dense(10,...) I get the following error: 但是使用Dense(10,...)我得到以下错误:

InvalidArgumentError: Received a label value of 10 which is outside >the valid range of [0, 10). InvalidArgumentError:接收到的标签值10大于有效范围[0,10)。 Label values: 3 2 5 2 10 3 2 5 标签值:3 2 5 2 10 3 2 5

I understand the error, and how to avoid it, but why isn't the range [0,10) sufficient given that I don't have houses with 0 bedrooms? 我理解该错误以及如何避免该错误,但是考虑到我没有带0个卧室的房屋,为什么[0,10)的范围还不够?

When I try and get a classification report I get two warnings: 当我尝试获取分类报告时,会收到两个警告:

UserWarning: labels size, 6, does not match size of target_names, 10 UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. UserWarning:标签大小为6,与target_names大小不匹配,为10 UndefinedMetricWarning:精度和F分数定义不明确,并且在没有预测样本的标签中设置为0.0。

I think these might be because my classification report only contains houses with 1-6 bedrooms. 我认为这些可能是因为我的分类报告仅包含1-6间卧室的房屋。 But am not sure - any insight you can give would be appreciated. 但不确定-您能提供的任何见解将不胜感激。

My code and the dataset can be cloned from here: https://github.com/davidrtfraser/blog-keras-multi-input 我的代码和数据集可以从这里克隆: https : //github.com/davidrtfraser/blog-keras-multi-input

Generally in Machine Learning, labels for a N classes are encoded as integers in the range 0 to N - 1, because this maps directly from class indices, so you can use argmax to recover them from model outputs. 通常在机器学习中,N个类的标签被编码为0到N-1范围内的整数,因为这直接从类索引映射,因此您可以使用argmax从模型输出中恢复它们。

So you need to encode your labels in the same way, the easiest way is to substract your [1, 10] labels to [0, 9] by substracting one from each label, and to get the number of bedrooms from the model output, you add one to the output label. 所以,你需要以同样的方式编码的标签,最简单的方法就是。减去你的[1, 10]标签[0, 9]由每个标签从其减去一个,并从模型输出得到卧室的数量,您将一个添加到输出标签。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM