简体   繁体   中英

Ensemble feature selection from feature sets

I have a question about ensemble feature selection.

My data set is consist of 1000 samples with about 30000 features, and they are classified into label A or label B. What I want to do is picking of some features which can classify the label efficiently.

I used three type of methods, univariate method(Pearson's coefficient), lasso regression and SVM-RFE(recursive feature elimination), so I got three feature sets from them. I used python scikit-learn for feature selection.

Then I am thinking of ensemble feature selection approach, because the size of features were so large. In this case, what is the way to make integrated subset with 3 feature sets?

What can I think is taking union of the sets and using lasso regression or SVM-RFE again, or just take the intersection of the sets.

Can anyone give an idea?

I guess what you do depends on how you want to use these features afterwards. If your goal is to "classify the label efficiently" one thing you can do is to use your classification algorithm (ie SVC, Lasso, etc.) as a wrapper and do Recursive Feature Elimination (RFE) with cross-validation .

You can start from the union of features from the previous three methods you used, or from scratch for the given type of model you want to fit, since the number of examples is small. In any case I believe the best way to select features in your case is to select the ones that optimize your goal, which seems to be classification accuracy, thus the CV proposal.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM