[英]Apply glm() by filtering a column by its value in R
I have a dataframe with let's call it dependent variable, various independent variables (indicators) and a filtering variable.我有一个 dataframe,我们称它为因变量、各种自变量(指标)和一个过滤变量。 My goal is to run regressions by filtering different categories in my filtering variable.
我的目标是通过过滤我的过滤变量中的不同类别来运行回归。 For example, if I want to run regression for
code == "all"
, I will just take my dataframe, filter the code, and run a regression:例如,如果我想对
code == "all"
运行回归,我将只使用我的 dataframe,过滤代码,然后运行回归:
sample_tib %>%
filter(code == "all") %>%
glm(love ~ ., data = ., family = "gaussian")
But there are several problems that I am facing:但是我面临着几个问题:
glm()
will take all columns, not excepting the code
.glm()
将采用所有列, code
除外。 The desirable input into the regression is love ~ ind1 + ind2 +... + ind_n
;love ~ ind1 + ind2 +... + ind_n
;code
and running different models is costly and not really the thing that I want.code
过滤并运行不同的模型是昂贵的,并不是我真正想要的。 Maybe there exist a function which filters the dataframe, then runs a regression and nests its results in a new dataframe or list?也许存在一个 function 过滤 dataframe,然后运行回归并将其结果嵌套在新的 dataframe 或列表中? I tried to figure this out and came across this question and beautiful Dave Gruenewald's solution.
我试图弄清楚这一点并遇到了这个问题和美丽的 Dave Gruenewald 的解决方案。 But his way takes only one pattern -
x ~ y
, one dependent and one independent variable.但他的方式只采用一种模式 -
x ~ y
,一个因变量和一个自变量。 Which is obviously not what I need.这显然不是我需要的。
So, is there any elegant solutions or specific packages and functions for this problem?那么,这个问题有没有优雅的解决方案或者具体的封装和功能呢?
Data:数据:
sample_tib <- data.frame(
code = c(
"all",
"all",
"all",
"all",
"all",
"all",
"all",
"all",
"all",
"all",
"all",
"all",
"Data Science",
"Data Science",
"Data Science",
"Data Science",
"Data Science",
"Data Science",
"Data Science",
"Data Science",
"Data Science",
"Data Science",
"Data Science",
"Data Science",
"Data Engineer",
"Data Engineer",
"Data Engineer",
"Data Engineer",
"Data Engineer",
"Data Engineer",
"Data Engineer",
"Data Engineer",
"Data Engineer",
"Data Engineer",
"Data Engineer",
"Data Engineer"
),
love = runif(36),
ind1 = runif(36),
ind2 = runif(36),
ind3 = runif(36),
ind4 = runif(36),
ind5 = runif(36),
ind6 = runif(36),
ind7 = runif(36)
)
We can use nest_by
from dplyr
我们可以使用
nest_by
中的dplyr
nest_by
to do the groupingnest_by
进行分组list
within mutate
mutate
的list
中创建 model NOTE: No other packages other than dplyr
is used注意:没有使用除
dplyr
以外的其他包
library(dplyr)
sample_tib %>%
nest_by(code) %>%
mutate(model = list(glm(love ~ ., data = data, family = 'gaussian'))) %>%
ungroup
-output -输出
# A tibble: 3 x 3
code data model
<chr> <list<tibble[,8]>> <list>
1 all [12 × 8] <glm>
2 Data Engineer [12 × 8] <glm>
3 Data Science [12 × 8] <glm>
We can split the data and apply glm
to each code
separately.我们可以拆分数据并将
glm
分别应用于每个code
。
library(dplyr)
library(purrr)
sample_tib %>%
group_split(code) %>%
map(function(x) glm(love~., data = select(x, -code), family = "gaussian"))
select(x, -code)
drops code
columns from the data so you can use love~.
select(x, -code)
从数据中删除code
列,因此您可以使用love~.
. .
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.