简体   繁体   中英

How to time demean data in R (demean by group)?

I would like to time demean my data in order to manually run a fixed effects regression via ordinary least squares (OLS). "time demean" refers to calculating a group-mean per units (eg person) and subtract this one. I thought this code might do the job, but it produces a grand mean for all rows:

set.seed(5)
paneldata = data.frame(id=c(1,1,1,1,1,1,1,2,2,2,2,2,2,2), 
                       year=seq(from=2003, to=2009, by=1), 
                       x = runif(14, min = 0, max = 25),
                       y = runif(14, min = 0, max = 25),)

paneldata %>%
  group_by(id) %>%
  mutate(MeanValue = mean(x), 
         tdm_x = x - MeanValue)

# A tibble: 14 x 6
# Groups:   id [2]
      id  year     x     y MeanValue  tdm_x
   <dbl> <dbl> <dbl> <dbl>     <dbl>  <dbl>
 1     1  2003  5.01  6.56      12.4 -7.38 
 2     1  2004 17.1   5.05      12.4  4.74 
 3     1  2005 22.9   9.69      12.4 10.5  
 4     1  2006  7.11 22.2       12.4 -5.28 
 5     1  2007  2.62 13.9       12.4 -9.77 
 6     1  2008 17.5  21.1       12.4  5.14 
 7     1  2009 13.2  22.3       12.4  0.812
 8     2  2003 20.2  18.0       12.4  7.81 
 9     2  2004 23.9   5.28      12.4 11.5  
10     2  2005  2.76  5.64      12.4 -9.63 
11     2  2006  6.83  3.50      12.4 -5.55 
12     2  2007 12.3  12.0       12.4 -0.124
13     2  2008  7.96 10.9       12.4 -4.43 
14     2  2009 14.0  24.1       12.4  1.59 

How can I modify my code to get time demeaning running? I am also looking for a way of automating this for several variables (not repeating the mutate over and over again). Thank you

It seems the "grand mean" problem was not reproducible. As for repeating the de-meaning across multiple columns, you can use mutate(across()) :

paneldata %>%
  group_by(id) %>%
  mutate(across(c("x", "y"), ~ .x - mean(.x), .names = "tdm_{col}"))

#> # A tibble: 14 x 6
#> # Groups:   id [2]
#>       id  year     x     y  tdm_x  tdm_y
#>    <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>
#>  1     1  2003  5.01  6.56 -7.21  -7.82 
#>  2     1  2004 17.1   5.05  4.91  -9.34 
#>  3     1  2005 22.9   9.69 10.7   -4.69 
#>  4     1  2006  7.11 22.2  -5.11   7.81 
#>  5     1  2007  2.62 13.9  -9.60  -0.510
#>  6     1  2008 17.5  21.1   5.31   6.67 
#>  7     1  2009 13.2  22.3   0.983  7.87 
#>  8     2  2003 20.2  18.0   7.64   6.66 
#>  9     2  2004 23.9   5.28 11.4   -6.08 
#> 10     2  2005  2.76  5.64 -9.80  -5.72 
#> 11     2  2006  6.83  3.50 -5.73  -7.86 
#> 12     2  2007 12.3  12.0  -0.295  0.637
#> 13     2  2008  7.96 10.9  -4.60  -0.426
#> 14     2  2009 14.0  24.1   1.42  12.8 

Here is an option with mutate_at

 library(dplyr)
 paneldata %>%
   group_by(id) %>%
   mutate_at(vars(c("x", "y"), list(tdm = ~ . - mean(.)))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM