简体   繁体   English

R中带有条件语句的汇总滚动平均值

[英]Aggregated rolling average with a conditional statement in R

I have a data frame that follows the following format. 我有一个遵循以下格式的数据框。

 match  team1 team2 winningTeam
 1      A     D     A
 2      B     E     E
 3      C     F     C
 4      D     C     C
 5      E     B     B
 6      F     A     A
 7      A     D     D
 8      D     A     A

What I want to do is to crate variables that calculates the form of both team 1 and 2 over the last x matches. 我想做的是创建变量,以计算最近x场比赛的第1队和第2队的形式。 For example, I would want to create a variable called team1_form_last3_matches which for match 8 would be 0.33 (as they won 1 of their last 3 matches) and there would also be a variable called team2_form_last3_matches which would be 0.66 in match 8 (as they won 2 of their last 3 matches). 例如,我想创建一个名为team1_form_last3_matches的变量,该变量在第8场比赛中为0.33(因为他们赢得了最近3场比赛中的1场比赛),并且在第8场比赛中还有一个名为team2_form_last3_matches的变量为0.66(因为他们赢得了比赛)他们最近3场比赛中的2场)。 Ideally I would like to be able to specify the number of previous matches to be considered when calculating the team x _form_last y variable and those variables to be automatically created. 理想情况下,我希望能够指定计算团队x _form_last y变量和要自动创建的那些变量时要考虑的先前比赛的次数。 I have tried a bunch of approaches using dplyr, zoo rolling mean functions and a load of nested for / if statements. 我已经尝试了使用dplyr,zoo滚动均值函数和嵌套的for / if语句的方法。 However, I have not quite cracked it and certainly not in an elegant way. 但是,我还没有完全破解它,当然也没有一种优雅的方式。 I feel like I am missing a simple solution to this generic problem. 我觉得我缺少针对此一般问题的简单解决方案。 Any help would be much appreciated! 任何帮助将非常感激!

Cheers, 干杯,

Jack 插口

How about something like: 怎么样:

dat <- data.frame(match = c(1:8), team1 = c("A","B","C","D","E","F","A","D"), team2 = c("D","E","F","C","B","A","D","A"), winningTeam = c("A","E","C","C","B","A","D","A"))
  match team1 team2 winningTeam
1     1     A     D           A
2     2     B     E           E
3     3     C     F           C
4     4     D     C           C
5     5     E     B           B
6     6     F     A           A
7     7     A     D           D
8     8     D     A           A

Allteams <- c("A","B","C","D","E","F")

# A vectorized function for you to use to do as you ask:
teamX_form_lastY <- function(teams, games, dat){
  sapply(teams, function(x) {
    games_info <- rowSums(dat[,c("team1","team2")] == x) + (dat[,"winningTeam"] == x)
    lookup <- ifelse(rev(games_info[games_info != 0])==2,1,0)
    games_won <- sum(lookup[1:games])
    if(length(lookup) < games) warning(paste("maximum games for team",x,"should be",length(lookup)))
    games_won/games
  })
}

teamX_form_lastY("A", 4, dat)
A 
0.75 

# Has a warning for the number of games you should be using
teamX_form_lastY("A", 5, dat)
A 
NA 
Warning message:
  In FUN(X[[i]], ...) : maximum games for team A should be 4

# vectorized input
teamX_form_lastY(teams = c("A","B"), games = 2, dat = dat)
A   B 
0.5 0.5 

# so you ca do all teams
teamX_form_lastY(teams = Allteams, 2, dat)
A   B   C   D   E   F 
0.5 0.5 1.0 0.5 0.5 0.0 

This works for t1l3, you will need to replicate it for t2. 这适用于t1l3,您需要将其复制到t2。

dat <- data.frame(match = c(1:8), team1 = c("A","B","C","D","E","F","A","D"), team2 = c("D","E","F","C","B","A","D","A"), winningTeam = c("A","E","C","C","B","A","D","A"),stringsAsFactors = FALSE)

dat$t1l3 <- c(NA,sapply(2:nrow(dat),function(i) {
  df <- dat[1:(i-1),] #just previous games, i.e. excludes current game
  df <- df[df$team1==dat$team1[i] | df$team2==dat$team1[i],] #just those containing T1
  df <- tail(df,3) #just the last three (or fewer if there aren't three previous games)
  return(sum(df$winningTeam==dat$team1[i])/nrow(df)) #total wins/total games (up to three)
}))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM