简体   繁体   中英

Apply glm() by filtering a column by its value in R

I have a dataframe with let's call it dependent variable, various independent variables (indicators) and a filtering variable. My goal is to run regressions by filtering different categories in my filtering variable. For example, if I want to run regression for code == "all" , I will just take my dataframe, filter the code, and run a regression:

sample_tib %>%
    filter(code == "all") %>%
    glm(love ~ ., data = ., family = "gaussian")

But there are several problems that I am facing:

  1. In my example above my glm() will take all columns, not excepting the code . The desirable input into the regression is love ~ ind1 + ind2 +... + ind_n ;
  2. Filtering by all codes in code and running different models is costly and not really the thing that I want.

Maybe there exist a function which filters the dataframe, then runs a regression and nests its results in a new dataframe or list? I tried to figure this out and came across this question and beautiful Dave Gruenewald's solution. But his way takes only one pattern - x ~ y , one dependent and one independent variable. Which is obviously not what I need.

So, is there any elegant solutions or specific packages and functions for this problem?

Data:

sample_tib <- data.frame(
  code = c(
    "all",
    "all",
    "all",
    "all",
    "all",
    "all",
    "all",
    "all",
    "all",
    "all",
    "all",
    "all",
    "Data Science",
    "Data Science",
    "Data Science",
    "Data Science",
    "Data Science",
    "Data Science",
    "Data Science",
    "Data Science",
    "Data Science",
    "Data Science",
    "Data Science",
    "Data Science",
    "Data Engineer",
    "Data Engineer",
    "Data Engineer",
    "Data Engineer",
    "Data Engineer",
    "Data Engineer",
    "Data Engineer",
    "Data Engineer",
    "Data Engineer",
    "Data Engineer",
    "Data Engineer",
    "Data Engineer"
  ),
  love = runif(36),
  ind1 = runif(36),
  ind2 = runif(36),
  ind3 = runif(36),
  ind4 = runif(36),
  ind5 = runif(36),
  ind6 = runif(36),
  ind7 = runif(36)
)

We can use nest_by from dplyr

  1. We just use nest_by to do the grouping
  2. Simply create the model in a list within mutate

NOTE: No other packages other than dplyr is used

library(dplyr)
sample_tib %>%
    nest_by(code) %>%
    mutate(model = list(glm(love ~ ., data = data, family = 'gaussian'))) %>%
    ungroup

-output

# A tibble: 3 x 3
  code                        data model 
  <chr>         <list<tibble[,8]>> <list>
1 all                     [12 × 8] <glm> 
2 Data Engineer           [12 × 8] <glm> 
3 Data Science            [12 × 8] <glm> 

We can split the data and apply glm to each code separately.

library(dplyr)
library(purrr)

sample_tib %>%
  group_split(code) %>%
  map(function(x) glm(love~., data = select(x, -code), family = "gaussian"))

select(x, -code) drops code columns from the data so you can use love~. .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM