简体   繁体   中英

Machine Learning Feature Ranking/Scoring for Regression in Java

Is there any feature scoring method available in Java for regression datasets where the class values are continuous numbers rather than binary?

The ML-Lib feature scoring seems to work only for classification datasets.

This largely depends on your regression algorithm. Good features for Kernel based regression algorithms might be pretty bad for linear classifiers. ( https://en.wikipedia.org/wiki/Feature_selection ) You seem to aim at the "filter approach". What works well in many regression settings is the Pearson Correlation . This is also available in ML-Lib.

However, you should consider to not add the K top-correlated features, but

  1. Avoid selecting pairs of highly-correlated feature. So you have to build the correlation matrix between all pairs of features.
  2. Select the top-feature, build a regression model, measure the error of the model, measure the correlation between the error and the remaining features. This will greedily select the best features
  3. Once you have selected you features you should consider doing a sensitivity analysis. This is, build a regression model for all features and, for all feature sets where one feature has been removed. If removing does not have a significant impact you can remove it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM