[英]How can I match the result of linear regression in R to be the same output as group_by dplyr?
我在 R 有一个数据集:
vec = c(200,300,400,500,600,100)
char1 = c("a","a","a","b","b","a")
char2 = c("c","c","d","c","d","d")
df2 = tibble(vec,char1,char2);df2
# A tibble: 6 × 3
vec char1 char2
<dbl> <chr> <chr>
1 200 a c
2 300 a c
3 400 a d
4 500 b c
5 600 b d
6 100 a d
如果我想计算每个 char1 变量的列向量的平均值,可以这样做:
df2%>%group_by(char1)%>%
summarise(mean(vec))
lm(df2$vec~df2$char1-1)
对于 char2 变量:
df2%>%group_by(char2)%>%
summarise(mean(vec))
lm(df2$vec~df2$char2-1)
结果分别与这两种情况的线性回归系数相匹配。
但是,如果我想计算每个 char1 和 char2,我会在 R 中执行:
df2%>%group_by(char1,char2)%>%
summarise(mean(vec))
这两个变量的线性回归等价物是多少?
有什么帮助吗?
指定char1
和char2
之间的交互,如char1:char2
得到:
lm(vec ~ char1:char2 + 0, data=df2)
#Call:
#lm(formula = vec ~ char1:char2 + 0, data = df2)
#
#Coefficients:
#char1a:char2c char1b:char2c char1a:char2d char1b:char2d
# 250 500 250 600
匹配预期结果:
df2 %>%
group_by(char1,char2) %>%
summarise(mean(vec))
## A tibble: 4 × 3
## Groups: char1 [2]
# char1 char2 mv
# <chr> <chr> <dbl>
#1 a c 250
#2 a d 250
#3 b c 500
#4 b d 600
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.