简体繁体 English

sklearn如何基于特征选择来选择分类特征

[英]How can sklearn select categorical features based on feature selection

原文 2014-07-29 16:35:20 8 1 python/ scikit-learn/ feature-selection

My question is i want to run feature selection on the data with several categorical variables. 我的问题是我想对具有几个分类变量的数据进行特征选择。 I have used get_dummies in pandas to generate all the sparse matrix for these categorical variables. 我在pandas使用了get_dummies来为这些分类变量生成所有稀疏矩阵。 My question is how sklearn knows that one specific sparse matrix actually belongs to one feature and select/drop them all? 我的问题是sklearn如何知道一个特定的稀疏矩阵实际上属于一项功能，然后全部选择/删除它们？ For example, I have a variable called city. 例如，我有一个名为city的变量。 There are New York, Chicago and Boston three levels for that variable, so the sparse matrix looks like: 该变量有纽约，芝加哥和波士顿三个级别，因此稀疏矩阵如下所示：

[1,0,0] [0,1,0] [0,0,1] How can I inform the sklearn that in these three "columns" actually belong to one feature, which is city and won't end up with choosing New York, and delete Chicago and Boston? [1,0,0] [0,1,0] [0,0,1]我如何通知sklearn，在这三个“列”中实际上属于一个要素，即城市，不会以选择纽约，然后删除芝加哥和波士顿？

Thank you so much! 非常感谢！

1 个解决方案

You can't. 你不能 The feature selection routines in scikit-learn will consider the dummy variables independently of each other. scikit-learn中的功能选择例程将独立考虑虚拟变量。 This means they can "trim" the domains of categorical variables down to the values that matter for prediction. 这意味着它们可以将分类变量的域“修剪”到对于预测重要的值。

如何在sklearn中编码分类特征？ - How to encode categorical features in sklearn?

如何解释sklearn.feature_selection中多类的_coeffs输出的特征重要性？ - How to interpret importance of features from _coeffs outputs for multi-class in sklearn.feature_selection?

当sklearn管道中有多种特征选择方法时如何获取所选特征的名称？ - How to get name of selected features when there are several feature selection methods in sklearn pipeline?

如何在 Sklearn 中重塑我的测试数据？（特征选择） - How can I reshape my test data in Sklearn? (feature selection)

使用 scikit-learn 对分类特征进行特征选择 - Feature selection using scikit-learn on categorical features

sklearn 随机森林可以直接处理分类特征吗？ - Can sklearn random forest directly handle categorical features?

如何实现分类变量的特征选择？ - How to implement feature selection for categorical variables?

如何将混合（分类和数字）特征传递给 sklearn 中的决策树回归器？ - how to pass mixed (categorical and numeric) features to Decision Tree Regressor in sklearn?

如何在 python 的 sklearn 中使用 gridsearchcv 执行特征选择 - How to perform feature selection with gridsearchcv in sklearn in python

特征选择和分类变量 - Feature selection and categorical variables

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在sklearn中编码分类特征？ - How to encode categorical features in sklearn? 如何解释sklearn.feature_selection中多类的_coeffs输出的特征重要性？ - How to interpret importance of features from _coeffs outputs for multi-class in sklearn.feature_selection? 当sklearn管道中有多种特征选择方法时如何获取所选特征的名称？ - How to get name of selected features when there are several feature selection methods in sklearn pipeline? 如何在 Sklearn 中重塑我的测试数据？（特征选择） - How can I reshape my test data in Sklearn? (feature selection) 使用 scikit-learn 对分类特征进行特征选择 - Feature selection using scikit-learn on categorical features sklearn 随机森林可以直接处理分类特征吗？ - Can sklearn random forest directly handle categorical features? 如何实现分类变量的特征选择？ - How to implement feature selection for categorical variables? 如何将混合（分类和数字）特征传递给 sklearn 中的决策树回归器？ - how to pass mixed (categorical and numeric) features to Decision Tree Regressor in sklearn? 如何在 python 的 sklearn 中使用 gridsearchcv 执行特征选择 - How to perform feature selection with gridsearchcv in sklearn in python 特征选择和分类变量 - Feature selection and categorical variables

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM