简体   繁体   English

sklearn如何基于特征选择来选择分类特征

[英]How can sklearn select categorical features based on feature selection

My question is i want to run feature selection on the data with several categorical variables. 我的问题是我想对具有几个分类变量的数据进行特征选择。 I have used get_dummies in pandas to generate all the sparse matrix for these categorical variables. 我在pandas使用了get_dummies来为这些分类变量生成所有稀疏矩阵。 My question is how sklearn knows that one specific sparse matrix actually belongs to one feature and select/drop them all? 我的问题是sklearn如何知道一个特定的稀疏矩阵实际上属于一项功能,然后全部选择/删除它们? For example, I have a variable called city. 例如,我有一个名为city的变量。 There are New York, Chicago and Boston three levels for that variable, so the sparse matrix looks like: 该变量有纽约,芝加哥和波士顿三个级别,因此稀疏矩阵如下所示:

[1,0,0] [0,1,0] [0,0,1] How can I inform the sklearn that in these three "columns" actually belong to one feature, which is city and won't end up with choosing New York, and delete Chicago and Boston? [1,0,0] [0,1,0] [0,0,1]我如何通知sklearn,在这三个“列”中实际上属于一个要素,即城市,不会以选择纽约,然后删除芝加哥和波士顿?

Thank you so much! 非常感谢!

You can't. 你不能 The feature selection routines in scikit-learn will consider the dummy variables independently of each other. scikit-learn中的功能选择例程将独立考虑虚拟变量。 This means they can "trim" the domains of categorical variables down to the values that matter for prediction. 这意味着它们可以将分类变量的域“修剪”到对于预测重要的值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在sklearn中编码分类特征? - How to encode categorical features in sklearn? 如何解释sklearn.feature_selection中多类的_coeffs输出的特征重要性? - How to interpret importance of features from _coeffs outputs for multi-class in sklearn.feature_selection? 当sklearn管道中有多种特征选择方法时如何获取所选特征的名称? - How to get name of selected features when there are several feature selection methods in sklearn pipeline? 如何在 Sklearn 中重塑我的测试数据? (特征选择) - How can I reshape my test data in Sklearn? (feature selection) 使用 scikit-learn 对分类特征进行特征选择 - Feature selection using scikit-learn on categorical features sklearn 随机森林可以直接处理分类特征吗? - Can sklearn random forest directly handle categorical features? 如何实现分类变量的特征选择? - How to implement feature selection for categorical variables? 如何将混合(分类和数字)特征传递给 sklearn 中的决策树回归器? - how to pass mixed (categorical and numeric) features to Decision Tree Regressor in sklearn? 如何在 python 的 sklearn 中使用 gridsearchcv 执行特征选择 - How to perform feature selection with gridsearchcv in sklearn in python 特征选择和分类变量 - Feature selection and categorical variables
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM