[英]Dealing with multiple categorical inputs and variable-sized groups as inputs to neural network
I'm working with data which consists of numerical and categorical features, where each input consists of a variable-sized group of the features. 我正在使用包含数字和分类特征的数据,其中每个输入都包含一组可变大小的特征。 For example: predict the price of a house by using features about each room in the house, and each house could have a different amount of rooms.
例如:通过使用房屋中每个房间的特征来预测房屋的价格,并且每个房屋可能拥有不同数量的房间。 The features could be size in meters, type (eg living room/bathroom/bedroom), color, floor... Some of the categorical features have high cardinality, and I may be using many features.
这些功能可能是以米为单位的大小,类型(例如,客厅/浴室/卧室),颜色,地板...一些分类功能具有很高的基数,我可能正在使用许多功能。 I'd want to use the features from n rooms to predict the price for each house.
我想使用n个房间的功能来预测每个房子的价格。 How would I structure my inputs/nn model to receive variable-sized groups of inputs?
如何构造输入/ nn模型以接收可变大小的输入组?
I thought of using one-hot encoding, but then I'd end up with large input vectors and I'd lose the connections between the features for each room. 我曾想过使用单点编码,但是最终我得到了很大的输入向量,并且失去了每个房间要素之间的联系。 I also thought of using embeddings, but I'm not sure what the best way is to combine the features/samples to properly input all the data without losing any info about which features come from which samples etc.
我也考虑过使用嵌入,但是我不确定最好的方法是组合特征/样本以正确输入所有数据,而不会丢失有关哪些特征来自哪些样本等的任何信息。
As the article, linked below, suggests... you've got one of three routes to choose from. 正如下面链接的文章所建议的那样……您已从以下三种路线中选择一种。
Link to the beautiful article 链接到美丽的文章
Happy coding :) 快乐的编码:)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.