DT:
HomeTeam AwayTeam Season Htpoints Atpoints
Mattersburg Salzburg 2015/2016 3 0
Salzburg Rapid Vienna 2015/2016 0 3
Admira Mattersburg 2015/2016 3 0
Admira Salzburg 2015/2016 1 1
Mattersburg Ried 2015/2016 3 0
Ried Salzburg 2015/2016 0 3
Altach Mattersburg 2015/2016 3 0
Austria Vie Mattersburg 2015/2016 3 0
Salzburg Altach 2015/2016 3 0
Mattersburg AC Wolfsberger2015/2016 3 0
Salzburg Austria Vienna2015/2016 1 1
Rapid Vienna Mattersburg 2015/2016 0 3
Sturm Graz Salzburg 2015/2016 0 3
Salzburg Grodig 2015/2016 3 0
To calculate the average points of a team in the last 3 matches at home:
library(zoo)
roll <- function(x, n) {
if (length(x) <= n) NaN
else rollapply(x, list(-seq(n)), mean, fill = NaN)
}
transform(DT, last3.HT.av.points = ave(Htpoints,Season,HomeTeam, FUN = function(x) roll(x, 3)))
The above is not a problem. On the other hand....
Is there any possibility to calculate the average points of the last 3 matches regardless of whether a team plays at home or away?
Desired Output (only showing information for Salzburg team):
HomeTeam AwayTeam Season Htpoints Atpoints HT.av.last3 AT.av.last3
Mattersburg Salzburg 2015/2016 3 0 NA
Salzburg Rapid Vienna 2015/2016 0 3 NA
Admira Mattersburg 2015/2016 3 0
Admira Salzburg 2015/2016 1 1 NA
Mattersburg Ried 2015/2016 3 0
Ried Salzburg 2015/2016 0 3 0.33
Altach Mattersburg 2015/2016 3 0
Austria Vie Mattersburg 2015/2016 3 0
Salzburg Altach 2015/2016 3 0 1.33
Mattersburg AC Wolfsberger2015/2016 3 0
Salzburg Austria Vienna2015/2016 1 1 2.33
Rapid Vienna Mattersburg 2015/2016 0 3
Sturm Graz Salzburg 2015/2016 0 3 2.33
Salzburg Grodig 2015/2016 3 0 2.33
Preferences: data.table
Reproducible dataset(not the same as the one above):
library(data.table)
DT <- fread("HomeTeam,AwayTeam,Season,Htpoints,Atpoints
Grodig,Salzburg,2015/2016,0,3
Rapid Vienna,Altach,2015/2016,1,1
Ried,Austria Vienna,2015/2016,3,0
Sturm Graz,Mattersburg,2015/2016,3,0
Admira,Rapid Vienna,2015/2016,1,1
Altach,Ried,2015/2016,0,3
Austria Vienna,Sturm Graz,2015/2016,1,1
Mattersburg,Grodig,2015/2016,3,0
Salzburg,AC Wolfsberger,2015/2016,3,0")
numTeams <- DT[,uniqueN(c(HomeTeam, AwayTeam))]
firstHalf <- lapply(seq_len(DT[,.N]),
function(n) data.table(
Matchday=n*2L-1L,
HomeTeam=DT[["HomeTeam"]],
AwayTeam=c(DT[["AwayTeam"]][-seq_len(n)], DT[["AwayTeam"]][seq_len(n)]),
Season=DT[["Season"]],
Htpoints=DT[["Htpoints"]],
Atpoints=DT[["Atpoints"]]
))
secondHalf <- lapply(seq_len(DT[,.N]),
function(n) data.table(
Matchday=n*2L,
HomeTeam=DT[["AwayTeam"]],
AwayTeam=c(DT[["HomeTeam"]][-seq_len(n)], DT[["HomeTeam"]][seq_len(n)]),
Season=DT[["Season"]],
Htpoints=DT[["Htpoints"]],
Atpoints=DT[["Atpoints"]]
))
DT <- rbindlist(c(firstHalf, secondHalf))[
HomeTeam!=AwayTeam][,
.SD[1L], by=.(HomeTeam, AwayTeam)]
setorder(DT, Matchday, HomeTeam)
DT <- DT[,-c("Matchday")]
library(tidyverse)
library(zoo)
DT_prep <- DT %>%
as.tibble() %>%
mutate(row = row_number())
DT_rollmeans <- DT_prep %>%
gather(teamside, teamname, -Season, -Htpoints, -Atpoints, -row) %>%
arrange(row) %>%
group_by(teamname) %>%
mutate(points = case_when(teamside == 'HomeTeam' ~ Htpoints,
teamside == 'AwayTeam' ~ Atpoints),
roll_mean = zoo::rollapply(points, 3, mean, align = 'right', fill = NA)) %>%
ungroup() %>%
select(row, teamside, roll_mean) %>%
spread(teamside, roll_mean) %>%
select(row, HT.av.last3 = HomeTeam, AT.av.last3 = AwayTeam)
DT_prep %>% left_join(DT_rollmeans) %>% select(-row)
This yields to a tibble that looks as follows:
# A tibble: 90 x 7
HomeTeam AwayTeam Season Htpoints Atpoints HT.av.last3 AT.av.last3
<chr> <chr> <chr> <int> <int> <dbl> <dbl>
1 Admira Ried 2015/2016 1 1 NA NA
2 Altach Sturm Graz 2015/2016 0 3 NA NA
3 Austria Vienna Grodig 2015/2016 1 1 NA NA
4 Grodig Altach 2015/2016 0 3 NA NA
5 Mattersburg AC Wolfsberger 2015/2016 3 0 NA NA
6 Rapid Vienna Austria Vienna 2015/2016 1 1 NA NA
7 Ried Mattersburg 2015/2016 3 0 NA NA
8 Sturm Graz Rapid Vienna 2015/2016 3 0 NA NA
9 AC Wolfsberger Grodig 2015/2016 3 0 NA 0.333
10 Mattersburg Admira 2015/2016 3 0 2 NA
# ... with 80 more rows
For the first 2 games for everybody the average is NA, after that its the rolling mean of 3 last games. First team to have at least three games is Grodig in data and it has 0.333 rolling average out of scoring 1, 0 and 0 in first 3 games.
I'm not happy with my solution but it works, I'm sure someone could make this a lot more compact.
Using DT
shown reproducibly in the Note at the end, add a row number column, i
, and create a data.table both
having two rows for each row in DT
, one for the Home and one for the Away team. Then use rollapply
on that and insert the results back into DT
. Note that it is not necessary to have special code to handle the case where there are fewer than 3 prior rows for a team as rollapply
will handle that automatically.
both <- rbind(
DT[, list(HomeAway = "Home", Team = HomeTeam, Season, Points = Htpoints, i = .I)],
DT[, list(HomeAway = "Away", Team = AwayTeam, Season, Points = Atpoints, i = .I)]
)
setkeyv(both, c("Season", "Team", "i"))
both[, Last3 := rollapply(Points, list(-seq(3)), mean, fill = NA_real_, na.rm = TRUE),
by = "Season,Team"]
setkeyv(both, "i")
DT[, HtLast3 := both[HomeAway == "Home", Last3]][
, AtLast3 := both[HomeAway == "Away", Last3]]
giving:
> DT
HomeTeam AwayTeam Season Htpoints Atpoints HtLast3 AtLast3
1: Mattersburg Salzburg 2015/2016 3 0 NA NA
2: Salzburg Rapid Vienna 2015/2016 0 3 NA NA
3: Admira Mattersburg 2015/2016 3 0 NA NA
4: Admira Salzburg 2015/2016 1 1 NA NA
5: Mattersburg Ried 2015/2016 3 0 NA NA
6: Ried Salzburg 2015/2016 0 3 NA 0.3333333
7: Altach Mattersburg 2015/2016 3 0 NA 2.0000000
8: Austria Vie Mattersburg 2015/2016 3 0 NA 1.0000000
9: Salzburg Altach 2015/2016 3 0 1.333333 NA
10: Mattersburg AC Wolfsberger 2015/2016 3 0 1.000000 NA
11: Salzburg Austria Vienna 2015/2016 1 1 2.333333 NA
12: Rapid Vienna Mattersburg 2015/2016 0 3 NA 1.0000000
13: Sturm Graz Salzburg 2015/2016 0 3 NA 2.3333333
14: Salzburg Grodig 2015/2016 3 0 2.333333 NA
DF <-
structure(list(HomeTeam = c("Mattersburg", "Salzburg", "Admira",
"Admira", "Mattersburg", "Ried", "Altach", "Austria Vie", "Salzburg",
"Mattersburg", "Salzburg", "Rapid Vienna", "Sturm Graz", "Salzburg"
), AwayTeam = c("Salzburg", "Rapid Vienna", "Mattersburg", "Salzburg",
"Ried", "Salzburg", "Mattersburg", "Mattersburg", "Altach", "AC Wolfsberger",
"Austria Vienna", "Mattersburg", "Salzburg", "Grodig"), Season = c("2015/2016",
"2015/2016", "2015/2016", "2015/2016", "2015/2016", "2015/2016",
"2015/2016", "2015/2016", "2015/2016", "2015/2016", "2015/2016",
"2015/2016", "2015/2016", "2015/2016"), Htpoints = c(3L, 0L,
3L, 1L, 3L, 0L, 3L, 3L, 3L, 3L, 1L, 0L, 0L, 3L), Atpoints = c(0L,
3L, 0L, 1L, 0L, 3L, 0L, 0L, 0L, 0L, 1L, 3L, 3L, 0L)),
class = "data.frame", row.names = c(NA, -14L))
DT <- as.data.table(DF)
I had some hard time with your dataset, so i made my own dataset which is like yours:
Home= sample(c("A","B","C","D"),9,replace = T)
Away= sample(c("A","B","C","D"),9,replace = T)
Home_Points= sample(c(0,1,3),9,replace = T)
Away_Points= sample(c(0,1,3),9,replace = T)
dt<-data.frame(HomeTeam=Home,
AwayTeam=Away,
Htpoints=Home_Points,Atpoints=Away_Points,
stringsAsFactors = FALSE)
and my dataset is:
HomeTeam AwayTeam Htpoints Atpoints
1 C C 0 1
2 D B 1 1
3 D B 3 0
4 A B 0 3
5 C D 1 3
6 C A 1 3
7 C D 1 1
8 D A 1 3
9 D B 3 3
Solution: combine the Home, and Away teams and their points
team <- as.vector(rbind(dt[,1],dt[,2]))
points<- as.vector(rbind(dt[,3],dt[,4]))
newDT<-data.frame( team=team,points=points,stringsAsFactors = FALSE)
and finally sum the points based on team, and regardless of home, or away:
library(tidyverse)
newDT %>%
group_by(team) %>%
summarise_all(sum)
and the result is:
team points
<chr> <dbl>
1 A 6
2 B 7
3 C 4
4 D 12
Point
if you think, the season may also change, you can add season to the new dataset as well, and then sort based on it ( arrange
).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.