简体   繁体   中英

Cumulative Mean with Grouping and Lag

I'm looking to create a cumulative mean that averages over multiple groups with a lag. It is for predictive analysis so I want each row to be the cumulative mean of all the rows before it (not including its own value).

This is a follow on from Grouped moving average in r

I'm sure there is a way to do this with rollapply and ave, I have been achieving this result with various moving windows using the below(just not a cummean):

library(zoo)

roll <- function(x, n) { 
   if (length(x) <= n) NA 
   else rollapply(x, list(-seq(n)), mean, fill = NA)
}
transform(DF, AVG2 = ave(Goals, Player, FUN = function(x) roll(x, 2)),
          AVG3 = ave(Goals, Player, FUN = function(x) roll(x, 3)))

Here is the desired output:

Player  Goals   **AVG**
S       5       
S       2       5
S       7       3.5
O       3       
O       9       3
O       6       6
O       3       6
S       7       4.66
O       1       5.25
S       7       5.25
S       3       5.6
Q       8       
S       3       5.16
O       4       4.4
P       1       
S       9       4.857
S       4       5.375
Z       6   
S       3       5.22
O       8       4.33
S       3       5
O       4       4.857
O       1       4.75
S       9       4.81
S       4       5.16
O       6       4.33
J       6   

and here is the code to recreate the initial table in r

Player <- c('S','S','S','O','O','O','O','S','O','S','S','O','S','O','O','S','S','O','S','O','S','O','O','S','S','O','J')
Goals <- c(5,2,7,3,9,6,3,7,1,7,3,8,3,4,1,9,4,6,3,8,3,4,1,9,4,6,6)
data.frame(Player, Goals)

Any help is much appreciated

1) We can use cumsum in the base of R. No packages are used.

cumroll <- function(x) {
  x <- head(x, -1)
  c(NA, cumsum(x) / seq_along(x))
}
transform(DF, AVG = ave(Goals, Player, FUN = cumroll))

2) This could also replace cumroll . It puts NaN in the postitions that are NA with cumroll :

cumroll2 <- function(x) (cumsum(x) - x) / (seq_along(x) - 1)
transform(DF, AVG = ave(Goals, Player, FUN = cumroll2))

3) If you did want to use rollapply here note that cumsum could be replaced with rollapplyr(x, seq_along(x), sum) in either of the above.

4) We could alternately use rollapply like this which like cumroll2 uses NaNs.

library(zoo)

cumroll3 <- function(x) {
  if (length(x) == 1) NaN
  else rollapply(x, lapply(seq_along(x) - 1, function(x) -seq_len(x)), mean)
}
transform(DF, AVG = ave(Goals, Player, FUN = cumroll3))

One option is to use data.table for the grouping and the cummean function from dplyr :

require(data.table)
require(dplyr)
Player <- c('S','S','S','O','O','O','O','S','O','S','S','O','S','O','O','S','S','O','S','O','S','O','O','S','S','O','J')
Goals <- c(5,2,7,3,9,6,3,7,1,7,3,8,3,4,1,9,4,6,3,8,3,4,1,9,4,6,6)
df<-data.frame(Player, Goals)

dt<-data.table(df)
lcummean<-function(x){
  head(c(NA,cummean(x)),-1)
}
dt[,ave:=lcummean(Goals),by=Player]

> dt
    Player Goals      ave
 1:      S     5       NA
 2:      S     2 5.000000
 3:      S     7 3.500000
 4:      O     3       NA
 5:      O     9 3.000000
 6:      O     6 6.000000
 7:      O     3 6.000000
 8:      S     7 4.666667
 9:      O     1 5.250000
10:      S     7 5.250000
11:      S     3 5.600000
12:      O     8 4.400000
13:      S     3 5.166667
14:      O     4 5.000000
15:      O     1 4.857143
16:      S     9 4.857143
17:      S     4 5.375000
18:      O     6 4.375000
19:      S     3 5.222222
20:      O     8 4.555556
21:      S     3 5.000000
22:      O     4 4.900000
23:      O     1 4.818182
24:      S     9 4.818182
25:      S     4 5.166667
26:      O     6 4.500000
27:      J     6       NA
    Player Goals      ave

If you don't mind warning messages, you can also just do this:

dt[,ave:=c(NA,cummean(Goals)),by=Player]

since the last element will be discarded, but you will get warning messages about it.

Using the cummean function of dplyr :

library(dplyr)
df1 %>% 
  group_by(Player) %>%
  mutate(mean_prev_goals = lag(cummean(Goals), n=1, default=0))

gives:

Source: local data frame [27 x 3]
Groups: Player [3]

   Player Goals mean_prev_goals
   (fctr) (dbl)           (dbl)
1       S     5        0.000000
2       S     2        5.000000
3       S     7        3.500000
4       O     3        0.000000
5       O     9        3.000000
6       O     6        6.000000
7       O     3        6.000000
8       S     7        4.666667
9       O     1        5.250000
10      S     7        5.250000
..    ...   ...             ...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM