[英]How can sklearn select categorical features based on feature selection
My question is i want to run feature selection on the data with several categorical variables. 我的问题是我想对具有几个分类变量的数据进行特征选择。 I have used
get_dummies
in pandas
to generate all the sparse matrix for these categorical variables. 我在
pandas
使用了get_dummies
来为这些分类变量生成所有稀疏矩阵。 My question is how sklearn knows that one specific sparse matrix actually belongs to one feature and select/drop them all? 我的问题是sklearn如何知道一个特定的稀疏矩阵实际上属于一项功能,然后全部选择/删除它们? For example, I have a variable called city.
例如,我有一个名为city的变量。 There are New York, Chicago and Boston three levels for that variable, so the sparse matrix looks like:
该变量有纽约,芝加哥和波士顿三个级别,因此稀疏矩阵如下所示:
[1,0,0] [0,1,0] [0,0,1]
How can I inform the sklearn that in these three "columns" actually belong to one feature, which is city and won't end up with choosing New York, and delete Chicago and Boston? [1,0,0] [0,1,0] [0,0,1]
我如何通知sklearn,在这三个“列”中实际上属于一个要素,即城市,不会以选择纽约,然后删除芝加哥和波士顿?
Thank you so much! 非常感谢!
You can't. 你不能 The feature selection routines in scikit-learn will consider the dummy variables independently of each other.
scikit-learn中的功能选择例程将独立考虑虚拟变量。 This means they can "trim" the domains of categorical variables down to the values that matter for prediction.
这意味着它们可以将分类变量的域“修剪”到对于预测重要的值。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.