Apply glm() by filtering a column by its value in R

Question

I have a dataframe with let's call it dependent variable, various independent variables (indicators) and a filtering variable. My goal is to run regressions by filtering different categories in my filtering variable. For example, if I want to run regression for code == "all" , I will just take my dataframe, filter the code, and run a regression:

sample_tib %>%
    filter(code == "all") %>%
    glm(love ~ ., data = ., family = "gaussian")

But there are several problems that I am facing:

In my example above my glm() will take all columns, not excepting the code . The desirable input into the regression is love ~ ind1 + ind2 +... + ind_n ;
Filtering by all codes in code and running different models is costly and not really the thing that I want.

Maybe there exist a function which filters the dataframe, then runs a regression and nests its results in a new dataframe or list? I tried to figure this out and came across this question and beautiful Dave Gruenewald's solution. But his way takes only one pattern - x ~ y , one dependent and one independent variable. Which is obviously not what I need.

So, is there any elegant solutions or specific packages and functions for this problem?

Data:

sample_tib <- data.frame(
  code = c(
    "all",
    "all",
    "all",
    "all",
    "all",
    "all",
    "all",
    "all",
    "all",
    "all",
    "all",
    "all",
    "Data Science",
    "Data Science",
    "Data Science",
    "Data Science",
    "Data Science",
    "Data Science",
    "Data Science",
    "Data Science",
    "Data Science",
    "Data Science",
    "Data Science",
    "Data Science",
    "Data Engineer",
    "Data Engineer",
    "Data Engineer",
    "Data Engineer",
    "Data Engineer",
    "Data Engineer",
    "Data Engineer",
    "Data Engineer",
    "Data Engineer",
    "Data Engineer",
    "Data Engineer",
    "Data Engineer"
  ),
  love = runif(36),
  ind1 = runif(36),
  ind2 = runif(36),
  ind3 = runif(36),
  ind4 = runif(36),
  ind5 = runif(36),
  ind6 = runif(36),
  ind7 = runif(36)
)

Answer 1

We can use nest_by from dplyr

We just use nest_by to do the grouping
Simply create the model in a list within mutate

NOTE: No other packages other than dplyr is used

library(dplyr)
sample_tib %>%
    nest_by(code) %>%
    mutate(model = list(glm(love ~ ., data = data, family = 'gaussian'))) %>%
    ungroup

-output

# A tibble: 3 x 3
  code                        data model 
  <chr>         <list<tibble[,8]>> <list>
1 all                     [12 × 8] <glm> 
2 Data Engineer           [12 × 8] <glm> 
3 Data Science            [12 × 8] <glm>

Answer 2

We can split the data and apply glm to each code separately.

library(dplyr)
library(purrr)

sample_tib %>%
  group_split(code) %>%
  map(function(x) glm(love~., data = select(x, -code), family = "gaussian"))

select(x, -code) drops code columns from the data so you can use love~. .

Apply glm() by filtering a column by its value in R

Question

2 answers

solution1
2 2021-06-17 18:20:27

solution2
1 ACCPTED 2021-06-17 10:33:34

Apply glm() by filtering a column by its value in R

Question

2 answers

solution1 2 2021-06-17 18:20:27

solution2 1 ACCPTED 2021-06-17 10:33:34

solution1
2 2021-06-17 18:20:27

solution2
1 ACCPTED 2021-06-17 10:33:34