简体   繁体   中英

Using plyr / dplyr to calculate mean every two years

I have a massive data set consisting of daily returns of 500 stocks over 34 years. I first ran ddply to create yearly median and return columns:

annual <- ddply(data, c("TICKER", "year"), summarize, 
                median_data = median(RETX),
                return = prod(1 + RET))

The data currently looks like this:

  TICKER year median_data    return
1      A 2000  -0.0081645 0.6717770 
2      A 2001  -0.0036845 0.5207290 
3      A 2002  -0.0069040 0.6299523
4      A 2003   0.0036585 1.6280659  
5      A 2004   0.0000120 0.8242153  
6      A 2005   0.0004025 1.3813425  

Now I would like to create a new column that contains the mean of median_data for each ticker for the past two years:

  TICKER year median_data    return    avg_median
1      A 2000  -0.0081645 0.6717770           NA
2      A 2001  -0.0036845 0.5207290    -0.0036845
3      A 2002  -0.0069040 0.6299523    -0.0105885
4      A 2003   0.0036585 1.6280659           ...
5      A 2004   0.0000120 0.8242153  
6      A 2005   0.0004025 1.3813425  

Any help on this would be greatly appreciated!

dplyr solution:

For completeness+correctness, here is the dplyr way since there is a dplyr tag to this question. Unless I am missing something dvdkamp's solution only works if you have one stock.

data: 500 stocks, 34 years

df <- expand.grid(
    year = 1980:2014,
    TICKER = paste0(expand.grid(letters,letters)[1:500,1],
                   expand.grid(letters,letters)[1:500,2])
            )
df$median_data <- rnorm(1:500)
df <- df[,c(2,1,3)]

looks like this:

  TICKER year median_data
1     aa 1980   0.5734215
2     aa 1981   1.2102109
3     aa 1982   0.8643419
4     aa 1983   0.7645975
5     aa 1984   0.4004396
6     aa 1985  -1.0195817

first group into stocks

by_ticker <- df %>% group_by(TICKER)

use lag() to generate means:

mean of this year and last's. Note the default lag(,n=1) (last 2 years inclusive)

by_ticker %>% 
         mutate(mean_last2y_incl = ( median_data + lag(median_data) ) / 2 )

mean of this last year and the year before that. (last 2 years exclusive)

by_ticker %>% 
         mutate(mean_last2y_excl = ( median_data + lag(median_data, n=2) ) / 2 )

see: http://cran.rstudio.com/web/packages/dplyr/vignettes/window-functions.html for more.

try

window_size <- 2 # number of years to average over

data$avg_median <- filter(data$median_data, 
rep(1,window_size)/window_size,  ## filter coefficients (1/2, 1/2)
sides = 1) ## do the average for years before and including this year

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM