简体   繁体   English

从功能集中选择集合功能

[英]Ensemble feature selection from feature sets

I have a question about ensemble feature selection. 我有一个关于合奏特征选择的问题。

My data set is consist of 1000 samples with about 30000 features, and they are classified into label A or label B. What I want to do is picking of some features which can classify the label efficiently. 我的数据集由1000个样本组成,具有约30000个特征,它们分为标签A或标签B.我想要做的是选择一些可以有效分类标签的特征。

I used three type of methods, univariate method(Pearson's coefficient), lasso regression and SVM-RFE(recursive feature elimination), so I got three feature sets from them. 我使用了三种方法,单变量方法(Pearson系数),套索回归和SVM-RFE(递归特征消除),所以我从中得到了三个特征集。 I used python scikit-learn for feature selection. 我使用python scikit-learn进行特征选择。

Then I am thinking of ensemble feature selection approach, because the size of features were so large. 然后我想到集合特征选择方法,因为特征的大小是如此之大。 In this case, what is the way to make integrated subset with 3 feature sets? 在这种情况下,使用3个功能集制作集成子集的方法是什么?

What can I think is taking union of the sets and using lasso regression or SVM-RFE again, or just take the intersection of the sets. 我能想到的是采用集合并再次使用套索回归或SVM-RFE,或者只是采用集合的交集。

Can anyone give an idea? 任何人都可以提出想法吗?

I guess what you do depends on how you want to use these features afterwards. 我猜你做的事情取决于你以后如何使用这些功能。 If your goal is to "classify the label efficiently" one thing you can do is to use your classification algorithm (ie SVC, Lasso, etc.) as a wrapper and do Recursive Feature Elimination (RFE) with cross-validation . 如果你的目标是“有效地对标签进行分类”,你可以做的一件事就是使用你的分类算法(即SVC,Lasso等)作为包装器,并使用交叉验证进行递归特征消除(RFE)

You can start from the union of features from the previous three methods you used, or from scratch for the given type of model you want to fit, since the number of examples is small. 您可以从您使用的前三种方法的特征联合开始,或从头开始为您想要适合的给定类型的模型开始,因为示例的数量很少。 In any case I believe the best way to select features in your case is to select the ones that optimize your goal, which seems to be classification accuracy, thus the CV proposal. 无论如何,我认为在您的案例中选择功能的最佳方法是选择优化目标的方法,这似乎是分类准确性,因此是CV提案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM