簡體   English   中英

最后3行的R平均值(不同列中的值)按兩列分組

[英]R average of last 3 rows(values in different columns) grouping by two columns

DT:

HomeTeam       AwayTeam       Season      Htpoints  Atpoints
Mattersburg    Salzburg      2015/2016        3         0
Salzburg       Rapid Vienna  2015/2016        0         3
Admira         Mattersburg   2015/2016        3         0
Admira         Salzburg      2015/2016        1         1
Mattersburg    Ried          2015/2016        3         0
Ried           Salzburg      2015/2016        0         3
Altach         Mattersburg   2015/2016        3         0
Austria Vie    Mattersburg   2015/2016        3         0
Salzburg       Altach        2015/2016        3         0
Mattersburg    AC Wolfsberger2015/2016        3         0
Salzburg       Austria Vienna2015/2016        1         1
Rapid Vienna   Mattersburg   2015/2016        0         3
Sturm Graz     Salzburg      2015/2016        0         3
Salzburg       Grodig        2015/2016        3         0

要計算一支球隊在最近三場主場比賽中的平均得分:

library(zoo)

roll <- function(x, n) { 
if (length(x) <= n) NaN 
else rollapply(x, list(-seq(n)), mean, fill = NaN)
}

transform(DT, last3.HT.av.points = ave(Htpoints,Season,HomeTeam, FUN = function(x) roll(x, 3)))

以上不是問題。 另一方面....

無論球隊在家還是在客場比賽,都有可能計算最近3場比賽的平均得分嗎?

所需的輸出(僅顯示薩爾茨堡團隊的信息):

HomeTeam       AwayTeam       Season      Htpoints  Atpoints   HT.av.last3  AT.av.last3
Mattersburg    Salzburg      2015/2016        3         0                        NA
Salzburg       Rapid Vienna  2015/2016        0         3           NA
Admira         Mattersburg   2015/2016        3         0
Admira         Salzburg      2015/2016        1         1                        NA
Mattersburg    Ried          2015/2016        3         0
Ried           Salzburg      2015/2016        0         3                        0.33
Altach         Mattersburg   2015/2016        3         0
Austria Vie    Mattersburg   2015/2016        3         0
Salzburg       Altach        2015/2016        3         0          1.33
Mattersburg    AC Wolfsberger2015/2016        3         0
Salzburg       Austria Vienna2015/2016        1         1          2.33
Rapid Vienna   Mattersburg   2015/2016        0         3
Sturm Graz     Salzburg      2015/2016        0         3                        2.33
Salzburg       Grodig        2015/2016        3         0          2.33

首選項:data.table

可重現的數據集(與上面的數據集不同):

 library(data.table)
 DT <- fread("HomeTeam,AwayTeam,Season,Htpoints,Atpoints
        Grodig,Salzburg,2015/2016,0,3
        Rapid Vienna,Altach,2015/2016,1,1
        Ried,Austria Vienna,2015/2016,3,0
        Sturm Graz,Mattersburg,2015/2016,3,0
        Admira,Rapid Vienna,2015/2016,1,1
        Altach,Ried,2015/2016,0,3
        Austria Vienna,Sturm Graz,2015/2016,1,1
        Mattersburg,Grodig,2015/2016,3,0
        Salzburg,AC Wolfsberger,2015/2016,3,0")

 numTeams <- DT[,uniqueN(c(HomeTeam, AwayTeam))]

 firstHalf <- lapply(seq_len(DT[,.N]),
                function(n) data.table(
                  Matchday=n*2L-1L,
                  HomeTeam=DT[["HomeTeam"]],
                  AwayTeam=c(DT[["AwayTeam"]][-seq_len(n)], DT[["AwayTeam"]][seq_len(n)]),
                  Season=DT[["Season"]],
                  Htpoints=DT[["Htpoints"]],
                  Atpoints=DT[["Atpoints"]]
                ))

 secondHalf <- lapply(seq_len(DT[,.N]),
                 function(n) data.table(
                   Matchday=n*2L,
                   HomeTeam=DT[["AwayTeam"]],
                   AwayTeam=c(DT[["HomeTeam"]][-seq_len(n)], DT[["HomeTeam"]][seq_len(n)]),
                   Season=DT[["Season"]],
                   Htpoints=DT[["Htpoints"]],
                   Atpoints=DT[["Atpoints"]]
                 ))


DT <- rbindlist(c(firstHalf, secondHalf))[
HomeTeam!=AwayTeam][,
            .SD[1L], by=.(HomeTeam, AwayTeam)]
setorder(DT, Matchday, HomeTeam)
DT <- DT[,-c("Matchday")]
library(tidyverse)
library(zoo)

DT_prep <- DT %>% 
  as.tibble() %>% 
  mutate(row = row_number()) 

DT_rollmeans <- DT_prep %>% 
  gather(teamside, teamname, -Season, -Htpoints, -Atpoints, -row) %>% 
  arrange(row) %>% 
  group_by(teamname) %>% 
  mutate(points = case_when(teamside == 'HomeTeam' ~ Htpoints,
                            teamside == 'AwayTeam' ~ Atpoints),
         roll_mean = zoo::rollapply(points, 3, mean, align = 'right', fill = NA)) %>% 
  ungroup() %>% 
  select(row, teamside, roll_mean) %>%
  spread(teamside, roll_mean) %>% 
  select(row, HT.av.last3 = HomeTeam, AT.av.last3 = AwayTeam)



DT_prep %>% left_join(DT_rollmeans) %>% select(-row)

這樣產生的小標題如下所示:

# A tibble: 90 x 7
   HomeTeam       AwayTeam       Season    Htpoints Atpoints HT.av.last3 AT.av.last3
   <chr>          <chr>          <chr>        <int>    <int>       <dbl>       <dbl>
 1 Admira         Ried           2015/2016        1        1          NA      NA    
 2 Altach         Sturm Graz     2015/2016        0        3          NA      NA    
 3 Austria Vienna Grodig         2015/2016        1        1          NA      NA    
 4 Grodig         Altach         2015/2016        0        3          NA      NA    
 5 Mattersburg    AC Wolfsberger 2015/2016        3        0          NA      NA    
 6 Rapid Vienna   Austria Vienna 2015/2016        1        1          NA      NA    
 7 Ried           Mattersburg    2015/2016        3        0          NA      NA    
 8 Sturm Graz     Rapid Vienna   2015/2016        3        0          NA      NA    
 9 AC Wolfsberger Grodig         2015/2016        3        0          NA       0.333
10 Mattersburg    Admira         2015/2016        3        0           2      NA    
# ... with 80 more rows

對於每個人來說,前2場比賽的平均值為NA,此后為最后3場比賽的平均值。 至少擁有三場比賽的第一支球隊是Grodig,數據在前3場比賽得分為1、0和0的情況下具有0.333滾動平均值。

我對我的解決方案不滿意,但是它可以正常工作,我敢肯定有人可以使它變得更緊湊。

使用在末尾的注釋中可重復顯示的DT ,添加一個行號列i ,並創建一個data.table,在DT每行both兩行,主隊一隊,客隊一支。 然后在其上使用rollapply並將結果插入回DT 請注意,如果團隊的先前行少於3行,則無需特殊代碼即可處理,因為rollapply將自動處理該行。

both <- rbind(
  DT[, list(HomeAway = "Home", Team = HomeTeam, Season, Points = Htpoints, i = .I)],
  DT[, list(HomeAway = "Away", Team = AwayTeam, Season, Points = Atpoints, i = .I)]
)

setkeyv(both, c("Season", "Team", "i"))
both[, Last3 := rollapply(Points, list(-seq(3)), mean, fill = NA_real_, na.rm = TRUE),
  by = "Season,Team"]

setkeyv(both, "i")
DT[, HtLast3 := both[HomeAway == "Home", Last3]][
   , AtLast3 := both[HomeAway == "Away", Last3]]

給予:

> DT
        HomeTeam       AwayTeam    Season Htpoints Atpoints  HtLast3   AtLast3
 1:  Mattersburg       Salzburg 2015/2016        3        0       NA        NA
 2:     Salzburg   Rapid Vienna 2015/2016        0        3       NA        NA
 3:       Admira    Mattersburg 2015/2016        3        0       NA        NA
 4:       Admira       Salzburg 2015/2016        1        1       NA        NA
 5:  Mattersburg           Ried 2015/2016        3        0       NA        NA
 6:         Ried       Salzburg 2015/2016        0        3       NA 0.3333333
 7:       Altach    Mattersburg 2015/2016        3        0       NA 2.0000000
 8:  Austria Vie    Mattersburg 2015/2016        3        0       NA 1.0000000
 9:     Salzburg         Altach 2015/2016        3        0 1.333333        NA
10:  Mattersburg AC Wolfsberger 2015/2016        3        0 1.000000        NA
11:     Salzburg Austria Vienna 2015/2016        1        1 2.333333        NA
12: Rapid Vienna    Mattersburg 2015/2016        0        3       NA 1.0000000
13:   Sturm Graz       Salzburg 2015/2016        0        3       NA 2.3333333
14:     Salzburg         Grodig 2015/2016        3        0 2.333333        NA

注意

DF <-
structure(list(HomeTeam = c("Mattersburg", "Salzburg", "Admira", 
"Admira", "Mattersburg", "Ried", "Altach", "Austria Vie", "Salzburg", 
"Mattersburg", "Salzburg", "Rapid Vienna", "Sturm Graz", "Salzburg"
), AwayTeam = c("Salzburg", "Rapid Vienna", "Mattersburg", "Salzburg", 
"Ried", "Salzburg", "Mattersburg", "Mattersburg", "Altach", "AC Wolfsberger", 
"Austria Vienna", "Mattersburg", "Salzburg", "Grodig"), Season = c("2015/2016", 
"2015/2016", "2015/2016", "2015/2016", "2015/2016", "2015/2016", 
"2015/2016", "2015/2016", "2015/2016", "2015/2016", "2015/2016", 
"2015/2016", "2015/2016", "2015/2016"), Htpoints = c(3L, 0L, 
3L, 1L, 3L, 0L, 3L, 3L, 3L, 3L, 1L, 0L, 0L, 3L), Atpoints = c(0L, 
3L, 0L, 1L, 0L, 3L, 0L, 0L, 0L, 0L, 1L, 3L, 3L, 0L)), 
class = "data.frame", row.names = c(NA, -14L))

DT <- as.data.table(DF)

我在處理您的數據集時遇到了一些困難,因此我制作了自己的數據集,就像您的數據集一樣:

Home= sample(c("A","B","C","D"),9,replace = T)
Away= sample(c("A","B","C","D"),9,replace = T)
Home_Points= sample(c(0,1,3),9,replace = T)
Away_Points= sample(c(0,1,3),9,replace = T)

dt<-data.frame(HomeTeam=Home,
               AwayTeam=Away, 
               Htpoints=Home_Points,Atpoints=Away_Points,
               stringsAsFactors = FALSE)

我的數據集是:

  HomeTeam AwayTeam Htpoints Atpoints
1        C        C        0        1
2        D        B        1        1
3        D        B        3        0
4        A        B        0        3
5        C        D        1        3
6        C        A        1        3
7        C        D        1        1
8        D        A        1        3
9        D        B        3        3

解決方案: 結合主隊和客隊以及他們的觀點

team  <- as.vector(rbind(dt[,1],dt[,2]))
points<- as.vector(rbind(dt[,3],dt[,4]))

newDT<-data.frame( team=team,points=points,stringsAsFactors = FALSE)

最后根據團隊,無論是在主場還是客場,都得出積分:

圖書館(tidyverse)

newDT %>%
  group_by(team) %>%
  summarise_all(sum) 

結果是:

 team  points
  <chr>  <dbl>
1 A          6
2 B          7
3 C          4
4 D         12

如果您認為季節也可能會更改,則也可以將季節添加到新數據集中,然后基於該季節進行排序( arrange )。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM