简体   繁体   中英

Feature selection by machine learning

The aim of my current study is to explore machine learning methods to select outcomes highly associated with treatment, which will be considered an approach for dealing with multiple testing.

My question is: what kinds of machine learning feature selection methods that I can use to find the strong association between the response variable and features.

Response variable: group (=1"treatment group", ="controll group")

Features: Emergency Department visit, Hospital visit, Oncology visit, Other visit, ED Cost, Hosp Cost, Onco Cost, Other Cost.

Thanks,

Lyon

A decision tree or boosting would be a good choice. You can see which splits produce have the highest entropy and accordingly conclude the respective feature to have high correlation with a given label.

Feature selection :

It is the process of selecting the subset of relevant features or variables .

There are three main subset types: wrappers,filters, embedded

Wrappers :

use the predictive model that scores the feature subsets based on the error rate of the model. While there is a computational intensive, they usually produce the best selection of features. A popular technique called the step wise regression.

Step wise regression :

It is an algorithm that adds the best features or deletes the worst features in each iteration.

Filters:

Filters use a proxy measure which is less computationally intensive but slightly less accurate. so it might a good prediction,but it is still may not be the best. Filters do capture the practicality of the data set,but in comparison to error measurement the feature set that selected will be more general more if a wrapper was used.

Filters produce a feature set that don't contain assumptions based on the predictive model, making it is useful tool for exposing the relationships between features such as "Bad" Together ,as a result drop the accuracy and "Good" together, raise the accuracy.

Embedded :

Embedded algorithm learns about which features best contribute to an accurate model during the model building process. The most common type is called the regularisation model.

Actually we can select the model from the Dimensionality reduction like Principal Discriminant analysis, linear Discriminant Analysis and kernel PCA. Or You can XGboost model.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM