简体   繁体   English

处理多个类别输入和可变大小的组作为神经网络的输入

[英]Dealing with multiple categorical inputs and variable-sized groups as inputs to neural network

I'm working with data which consists of numerical and categorical features, where each input consists of a variable-sized group of the features. 我正在使用包含数字和分类特征的数据,其中每个输入都包含一组可变大小的特征。 For example: predict the price of a house by using features about each room in the house, and each house could have a different amount of rooms. 例如:通过使用房屋中每个房间的特征来预测房屋的价格,并且每个房屋可能拥有不同数量的房间。 The features could be size in meters, type (eg living room/bathroom/bedroom), color, floor... Some of the categorical features have high cardinality, and I may be using many features. 这些功能可能是以米为单位的大小,类型(例如,客厅/浴室/卧室),颜色,地板...一些分类功能具有很高的基数,我可能正在使用许多功能。 I'd want to use the features from n rooms to predict the price for each house. 我想使用n个房间的功能来预测每个房子的价格。 How would I structure my inputs/nn model to receive variable-sized groups of inputs? 如何构造输入/ nn模型以接收可变大小的输入组?

I thought of using one-hot encoding, but then I'd end up with large input vectors and I'd lose the connections between the features for each room. 我曾想过使用单点编码,但是最终我得到了很大的输入向量,并且失去了每个房间要素之间的联系。 I also thought of using embeddings, but I'm not sure what the best way is to combine the features/samples to properly input all the data without losing any info about which features come from which samples etc. 我也考虑过使用嵌入,但是我不确定最好的方法是组合特征/样本以正确输入所有数据,而不会丢失有关哪些特征来自哪些样本等的任何信息。

As the article, linked below, suggests... you've got one of three routes to choose from. 正如下面链接的文章所建议的那样……您已从以下三种路线中选择一种。

  • Ordinal Encoding which I am thinking is not the right use case for your example 我认为序数编码不是您的示例的正确用例
  • One Hot Encoding which you've ruled out efficiently. 您已有效排除的一种热编码。
  • Difference Encoding, which is I think a little bit suited as there are master bedrooms, minor ones, guest ones and children ones. 差异编码,我觉得有点合适,因为有主卧,小卧,客卧和儿童卧。 So, try that angle. 因此,尝试该角度。

Link to the beautiful article 链接到美丽的文章

Happy coding :) 快乐的编码:)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM