简体   繁体   English

最后3行的R平均值(不同列中的值)按两列分组

[英]R average of last 3 rows(values in different columns) grouping by two columns

DT: DT:

HomeTeam       AwayTeam       Season      Htpoints  Atpoints
Mattersburg    Salzburg      2015/2016        3         0
Salzburg       Rapid Vienna  2015/2016        0         3
Admira         Mattersburg   2015/2016        3         0
Admira         Salzburg      2015/2016        1         1
Mattersburg    Ried          2015/2016        3         0
Ried           Salzburg      2015/2016        0         3
Altach         Mattersburg   2015/2016        3         0
Austria Vie    Mattersburg   2015/2016        3         0
Salzburg       Altach        2015/2016        3         0
Mattersburg    AC Wolfsberger2015/2016        3         0
Salzburg       Austria Vienna2015/2016        1         1
Rapid Vienna   Mattersburg   2015/2016        0         3
Sturm Graz     Salzburg      2015/2016        0         3
Salzburg       Grodig        2015/2016        3         0

To calculate the average points of a team in the last 3 matches at home: 要计算一支球队在最近三场主场比赛中的平均得分:

library(zoo)

roll <- function(x, n) { 
if (length(x) <= n) NaN 
else rollapply(x, list(-seq(n)), mean, fill = NaN)
}

transform(DT, last3.HT.av.points = ave(Htpoints,Season,HomeTeam, FUN = function(x) roll(x, 3)))

The above is not a problem. 以上不是问题。 On the other hand.... 另一方面....

Is there any possibility to calculate the average points of the last 3 matches regardless of whether a team plays at home or away? 无论球队在家还是在客场比赛,都有可能计算最近3场比赛的平均得分吗?

Desired Output (only showing information for Salzburg team): 所需的输出(仅显示萨尔茨堡团队的信息):

HomeTeam       AwayTeam       Season      Htpoints  Atpoints   HT.av.last3  AT.av.last3
Mattersburg    Salzburg      2015/2016        3         0                        NA
Salzburg       Rapid Vienna  2015/2016        0         3           NA
Admira         Mattersburg   2015/2016        3         0
Admira         Salzburg      2015/2016        1         1                        NA
Mattersburg    Ried          2015/2016        3         0
Ried           Salzburg      2015/2016        0         3                        0.33
Altach         Mattersburg   2015/2016        3         0
Austria Vie    Mattersburg   2015/2016        3         0
Salzburg       Altach        2015/2016        3         0          1.33
Mattersburg    AC Wolfsberger2015/2016        3         0
Salzburg       Austria Vienna2015/2016        1         1          2.33
Rapid Vienna   Mattersburg   2015/2016        0         3
Sturm Graz     Salzburg      2015/2016        0         3                        2.33
Salzburg       Grodig        2015/2016        3         0          2.33

Preferences: data.table 首选项:data.table

Reproducible dataset(not the same as the one above): 可重现的数据集(与上面的数据集不同):

 library(data.table)
 DT <- fread("HomeTeam,AwayTeam,Season,Htpoints,Atpoints
        Grodig,Salzburg,2015/2016,0,3
        Rapid Vienna,Altach,2015/2016,1,1
        Ried,Austria Vienna,2015/2016,3,0
        Sturm Graz,Mattersburg,2015/2016,3,0
        Admira,Rapid Vienna,2015/2016,1,1
        Altach,Ried,2015/2016,0,3
        Austria Vienna,Sturm Graz,2015/2016,1,1
        Mattersburg,Grodig,2015/2016,3,0
        Salzburg,AC Wolfsberger,2015/2016,3,0")

 numTeams <- DT[,uniqueN(c(HomeTeam, AwayTeam))]

 firstHalf <- lapply(seq_len(DT[,.N]),
                function(n) data.table(
                  Matchday=n*2L-1L,
                  HomeTeam=DT[["HomeTeam"]],
                  AwayTeam=c(DT[["AwayTeam"]][-seq_len(n)], DT[["AwayTeam"]][seq_len(n)]),
                  Season=DT[["Season"]],
                  Htpoints=DT[["Htpoints"]],
                  Atpoints=DT[["Atpoints"]]
                ))

 secondHalf <- lapply(seq_len(DT[,.N]),
                 function(n) data.table(
                   Matchday=n*2L,
                   HomeTeam=DT[["AwayTeam"]],
                   AwayTeam=c(DT[["HomeTeam"]][-seq_len(n)], DT[["HomeTeam"]][seq_len(n)]),
                   Season=DT[["Season"]],
                   Htpoints=DT[["Htpoints"]],
                   Atpoints=DT[["Atpoints"]]
                 ))


DT <- rbindlist(c(firstHalf, secondHalf))[
HomeTeam!=AwayTeam][,
            .SD[1L], by=.(HomeTeam, AwayTeam)]
setorder(DT, Matchday, HomeTeam)
DT <- DT[,-c("Matchday")]
library(tidyverse)
library(zoo)

DT_prep <- DT %>% 
  as.tibble() %>% 
  mutate(row = row_number()) 

DT_rollmeans <- DT_prep %>% 
  gather(teamside, teamname, -Season, -Htpoints, -Atpoints, -row) %>% 
  arrange(row) %>% 
  group_by(teamname) %>% 
  mutate(points = case_when(teamside == 'HomeTeam' ~ Htpoints,
                            teamside == 'AwayTeam' ~ Atpoints),
         roll_mean = zoo::rollapply(points, 3, mean, align = 'right', fill = NA)) %>% 
  ungroup() %>% 
  select(row, teamside, roll_mean) %>%
  spread(teamside, roll_mean) %>% 
  select(row, HT.av.last3 = HomeTeam, AT.av.last3 = AwayTeam)



DT_prep %>% left_join(DT_rollmeans) %>% select(-row)

This yields to a tibble that looks as follows: 这样产生的小标题如下所示:

# A tibble: 90 x 7
   HomeTeam       AwayTeam       Season    Htpoints Atpoints HT.av.last3 AT.av.last3
   <chr>          <chr>          <chr>        <int>    <int>       <dbl>       <dbl>
 1 Admira         Ried           2015/2016        1        1          NA      NA    
 2 Altach         Sturm Graz     2015/2016        0        3          NA      NA    
 3 Austria Vienna Grodig         2015/2016        1        1          NA      NA    
 4 Grodig         Altach         2015/2016        0        3          NA      NA    
 5 Mattersburg    AC Wolfsberger 2015/2016        3        0          NA      NA    
 6 Rapid Vienna   Austria Vienna 2015/2016        1        1          NA      NA    
 7 Ried           Mattersburg    2015/2016        3        0          NA      NA    
 8 Sturm Graz     Rapid Vienna   2015/2016        3        0          NA      NA    
 9 AC Wolfsberger Grodig         2015/2016        3        0          NA       0.333
10 Mattersburg    Admira         2015/2016        3        0           2      NA    
# ... with 80 more rows

For the first 2 games for everybody the average is NA, after that its the rolling mean of 3 last games. 对于每个人来说,前2场比赛的平均值为NA,此后为最后3场比赛的平均值。 First team to have at least three games is Grodig in data and it has 0.333 rolling average out of scoring 1, 0 and 0 in first 3 games. 至少拥有三场比赛的第一支球队是Grodig,数据在前3场比赛得分为1、0和0的情况下具有0.333滚动平均值。

I'm not happy with my solution but it works, I'm sure someone could make this a lot more compact. 我对我的解决方案不满意,但是它可以正常工作,我敢肯定有人可以使它变得更紧凑。

Using DT shown reproducibly in the Note at the end, add a row number column, i , and create a data.table both having two rows for each row in DT , one for the Home and one for the Away team. 使用在末尾的注释中可重复显示的DT ,添加一个行号列i ,并创建一个data.table,在DT每行both两行,主队一队,客队一支。 Then use rollapply on that and insert the results back into DT . 然后在其上使用rollapply并将结果插入回DT Note that it is not necessary to have special code to handle the case where there are fewer than 3 prior rows for a team as rollapply will handle that automatically. 请注意,如果团队的先前行少于3行,则无需特殊代码即可处理,因为rollapply将自动处理该行。

both <- rbind(
  DT[, list(HomeAway = "Home", Team = HomeTeam, Season, Points = Htpoints, i = .I)],
  DT[, list(HomeAway = "Away", Team = AwayTeam, Season, Points = Atpoints, i = .I)]
)

setkeyv(both, c("Season", "Team", "i"))
both[, Last3 := rollapply(Points, list(-seq(3)), mean, fill = NA_real_, na.rm = TRUE),
  by = "Season,Team"]

setkeyv(both, "i")
DT[, HtLast3 := both[HomeAway == "Home", Last3]][
   , AtLast3 := both[HomeAway == "Away", Last3]]

giving: 给予:

> DT
        HomeTeam       AwayTeam    Season Htpoints Atpoints  HtLast3   AtLast3
 1:  Mattersburg       Salzburg 2015/2016        3        0       NA        NA
 2:     Salzburg   Rapid Vienna 2015/2016        0        3       NA        NA
 3:       Admira    Mattersburg 2015/2016        3        0       NA        NA
 4:       Admira       Salzburg 2015/2016        1        1       NA        NA
 5:  Mattersburg           Ried 2015/2016        3        0       NA        NA
 6:         Ried       Salzburg 2015/2016        0        3       NA 0.3333333
 7:       Altach    Mattersburg 2015/2016        3        0       NA 2.0000000
 8:  Austria Vie    Mattersburg 2015/2016        3        0       NA 1.0000000
 9:     Salzburg         Altach 2015/2016        3        0 1.333333        NA
10:  Mattersburg AC Wolfsberger 2015/2016        3        0 1.000000        NA
11:     Salzburg Austria Vienna 2015/2016        1        1 2.333333        NA
12: Rapid Vienna    Mattersburg 2015/2016        0        3       NA 1.0000000
13:   Sturm Graz       Salzburg 2015/2016        0        3       NA 2.3333333
14:     Salzburg         Grodig 2015/2016        3        0 2.333333        NA

Note 注意

DF <-
structure(list(HomeTeam = c("Mattersburg", "Salzburg", "Admira", 
"Admira", "Mattersburg", "Ried", "Altach", "Austria Vie", "Salzburg", 
"Mattersburg", "Salzburg", "Rapid Vienna", "Sturm Graz", "Salzburg"
), AwayTeam = c("Salzburg", "Rapid Vienna", "Mattersburg", "Salzburg", 
"Ried", "Salzburg", "Mattersburg", "Mattersburg", "Altach", "AC Wolfsberger", 
"Austria Vienna", "Mattersburg", "Salzburg", "Grodig"), Season = c("2015/2016", 
"2015/2016", "2015/2016", "2015/2016", "2015/2016", "2015/2016", 
"2015/2016", "2015/2016", "2015/2016", "2015/2016", "2015/2016", 
"2015/2016", "2015/2016", "2015/2016"), Htpoints = c(3L, 0L, 
3L, 1L, 3L, 0L, 3L, 3L, 3L, 3L, 1L, 0L, 0L, 3L), Atpoints = c(0L, 
3L, 0L, 1L, 0L, 3L, 0L, 0L, 0L, 0L, 1L, 3L, 3L, 0L)), 
class = "data.frame", row.names = c(NA, -14L))

DT <- as.data.table(DF)

I had some hard time with your dataset, so i made my own dataset which is like yours: 我在处理您的数据集时遇到了一些困难,因此我制作了自己的数据集,就像您的数据集一样:

Home= sample(c("A","B","C","D"),9,replace = T)
Away= sample(c("A","B","C","D"),9,replace = T)
Home_Points= sample(c(0,1,3),9,replace = T)
Away_Points= sample(c(0,1,3),9,replace = T)

dt<-data.frame(HomeTeam=Home,
               AwayTeam=Away, 
               Htpoints=Home_Points,Atpoints=Away_Points,
               stringsAsFactors = FALSE)

and my dataset is: 我的数据集是:

  HomeTeam AwayTeam Htpoints Atpoints
1        C        C        0        1
2        D        B        1        1
3        D        B        3        0
4        A        B        0        3
5        C        D        1        3
6        C        A        1        3
7        C        D        1        1
8        D        A        1        3
9        D        B        3        3

Solution: combine the Home, and Away teams and their points 解决方案: 结合主队和客队以及他们的观点

team  <- as.vector(rbind(dt[,1],dt[,2]))
points<- as.vector(rbind(dt[,3],dt[,4]))

newDT<-data.frame( team=team,points=points,stringsAsFactors = FALSE)

and finally sum the points based on team, and regardless of home, or away: 最后根据团队,无论是在主场还是客场,都得出积分:

library(tidyverse) 图书馆(tidyverse)

newDT %>%
  group_by(team) %>%
  summarise_all(sum) 

and the result is: 结果是:

 team  points
  <chr>  <dbl>
1 A          6
2 B          7
3 C          4
4 D         12

Point

if you think, the season may also change, you can add season to the new dataset as well, and then sort based on it ( arrange ). 如果您认为季节也可能会更改,则也可以将季节添加到新数据集中,然后基于该季节进行排序( arrange )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM