I have a dataframe with let's call it dependent variable, various independent variables (indicators) and a filtering variable. My goal is to run regressions by filtering different categories in my filtering variable. For example, if I want to run regression for code == "all"
, I will just take my dataframe, filter the code, and run a regression:
sample_tib %>%
filter(code == "all") %>%
glm(love ~ ., data = ., family = "gaussian")
But there are several problems that I am facing:
glm()
will take all columns, not excepting the code
. The desirable input into the regression is love ~ ind1 + ind2 +... + ind_n
;code
and running different models is costly and not really the thing that I want. Maybe there exist a function which filters the dataframe, then runs a regression and nests its results in a new dataframe or list? I tried to figure this out and came across this question and beautiful Dave Gruenewald's solution. But his way takes only one pattern - x ~ y
, one dependent and one independent variable. Which is obviously not what I need.
So, is there any elegant solutions or specific packages and functions for this problem?
Data:
sample_tib <- data.frame(
code = c(
"all",
"all",
"all",
"all",
"all",
"all",
"all",
"all",
"all",
"all",
"all",
"all",
"Data Science",
"Data Science",
"Data Science",
"Data Science",
"Data Science",
"Data Science",
"Data Science",
"Data Science",
"Data Science",
"Data Science",
"Data Science",
"Data Science",
"Data Engineer",
"Data Engineer",
"Data Engineer",
"Data Engineer",
"Data Engineer",
"Data Engineer",
"Data Engineer",
"Data Engineer",
"Data Engineer",
"Data Engineer",
"Data Engineer",
"Data Engineer"
),
love = runif(36),
ind1 = runif(36),
ind2 = runif(36),
ind3 = runif(36),
ind4 = runif(36),
ind5 = runif(36),
ind6 = runif(36),
ind7 = runif(36)
)
We can use nest_by
from dplyr
nest_by
to do the groupinglist
within mutate
NOTE: No other packages other than dplyr
is used
library(dplyr)
sample_tib %>%
nest_by(code) %>%
mutate(model = list(glm(love ~ ., data = data, family = 'gaussian'))) %>%
ungroup
-output
# A tibble: 3 x 3
code data model
<chr> <list<tibble[,8]>> <list>
1 all [12 × 8] <glm>
2 Data Engineer [12 × 8] <glm>
3 Data Science [12 × 8] <glm>
We can split the data and apply glm
to each code
separately.
library(dplyr)
library(purrr)
sample_tib %>%
group_split(code) %>%
map(function(x) glm(love~., data = select(x, -code), family = "gaussian"))
select(x, -code)
drops code
columns from the data so you can use love~.
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.