简体   繁体   中英

Python Select variables in multiple linear regression

I have a dependent variable y and 6 independent variables. I want to make a linear regression out of it. I use sklearn library to do it.

The problem is some of my independent variables have correlation more than 0.5. So I can't have them in my model at the same time

I searched throw internet but didn't find any solution to select best set of independent variables to draw linear regression and output the variables that had been selected.

If you see that you have a correlation between independent variables. You should consider to remove them.

I see you are working with scikit-learn. If you don't want to do any feature selection manually, you could always use one of the feature selection methods in scikit-learns feature_selection module . There are many ways to automatically remove features, and you should cross-validate to determine which one is best for your problem.

You are probably looking for a k-fold validation model.

The idea is to randomly select your features, and have a way to validate them against each other.

The idea is to train your model with your feature selection on (k-1) partitions of your data. And validate it against the last partition. You do it for each partition and take the average of your score (MAE / RMSE for instance)

Your score is an objectif figure to compare your models aka your features selections

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM