简体   繁体   中英

Lasso Regression coefficients to find a linear model

I am doing linear models in R. My factors include birth rates, death rates, infant mortality rates, life expectancies, and region. region has 7 levels, using numerical numbers to represent each region:

  1. East Asia & Pacific
  2. South Asia
  3. Europe & Central Asia
  4. North America
  5. Latin America
  6. Middle East & North Africa
  7. Sub-Saharan Africa

I ran a Lasso Regression in R to try to improve the generalized linear model. The Lasso Regression coefficients is as follows:
在此处输入图像描述

I will put the factors selected by Lasso Regression into the lm function in R:

Lasso.lm <- lm(log(GNIpercapita) ~ deathrate + infantdeaths + life.exp.avg + 
                                    life.exp.diff + region, data=econdev) 

However, for regions, how do I add each region into the linear model lm? For example, regionEast Asia & Pacific , I can't jut add as + regionEast Asia & Pacific .

You cannot use pieces and parts of the category.

You can eliminate numerical variables, or entire columns of categorical variables, but you cannot pick and choose individual categories because it fragments your dataframe.

You might be better off to use the outcome of the Lasso Regression itself and predict from it. It is not less of a regression because of the regularization. It is more complex, and more robust and less straight forward, but not 'worse'.

If that does not work for you, then you can run an lm() with the continuous variables selected and the entire region variable and accept that the model is imperfect as all models are or remove the region and settle for what may be a less predictive model.

I agree with previous comments in that it is not recommended to pick and choose parts of a categorical variable. If you would still like to do it, it is easy using the modeldb package to create dummy variables for each level of your categorical variable. Remember in your regression lm() you have to leave one level of the categorical variable out to avoid perfect collinearity.

library(modeldb)

df %>% 
  add_dummy_variables(region)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM