[英]R - using regression functions within a group
Suppose I have a dataframe df
with three variables df$x
, df$y
, df$z
, and there is a grouping variable df$g
. 假设我有一个带有三个变量
df$x
, df$y
, df$z
的数据帧df
,并且有一个分组变量df$g
。
Usually, to compute a function WITHIN each group, I do the following 通常,为了计算每组中的函数,我执行以下操作
df$new<-unlist(tapply(df$x,df$g,FUN=myfunc))
Now suppose I want to generate residuals from regression of x
on y
and z
WITHIN each value of group g
, how do I implement it? 现在假设我想生成的回归残差
x
上y
和z
内组中的每个值g
,我该如何实现呢?
More specifically, without using groups, I would have done 更具体地说,我不会使用群组,我会这样做
df$new<-resid(lm(df$x ~ df$y + df$z, na.action, na.exclude))
One solution to carry out the previous operation WITHIN groups is to use a loop over unique elements of `df$g', but it would be great if there is any vectorized solution. 执行以前的WITHIN组操作的一个解决方案是在`df $ g'的唯一元素上使用循环,但是如果有任何矢量化解决方案那将是很好的。
library(plyr)
ddply(mydata,.(g),transform, new=resid(lm(x ~ y + z, na.action, na.exclude)))
Test using mtcars
data: 使用
mtcars
数据测试:
mydata<-mtcars
myres<-ddply(mydata,.(carb),transform, new=resid(lm(mpg ~ disp + hp))) # g=carb, x=mpg,y=disp,z=hp
> head(myres)
mpg cyl disp hp drat wt qsec vs am gear carb new
1 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 0.20604566
2 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 2.03023747
3 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 -2.39754247
4 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 1.31212635
5 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 2.60271481
6 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 0.03913515
In data.table
you can use by
在
data.table
可以使用by
library(data.table)
DT <- data.table(df)
DT[, new := resid(lm(x ~ y + z, na.action, na.exclude)), by = g]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.