I have a dataset of NHL players that includes number of goals for each player for every season that player played. I calculate the running total of goals over a player's career in order to identify a "running" Top 10 players.
toy_data <- data.frame(player=c("gretzky","gretzky","gretzky","gretzky","gretzky","gretzky","gretzky","gretzky","gretzky","gretzky"),
goal_total=c(5,10,15,20,25,30,35,40,45,50),
goals=c(5,5,5,5,5,5,5,5,5,5),
year=c(1990,1991,1992,1993,1994,1995,1996,1997,1998,1999))
player goal_total goals year
1 gretzky 5 5 1990
2 gretzky 10 5 1991
3 gretzky 15 5 1992
4 gretzky 20 5 1993
5 gretzky 25 5 1994
6 gretzky 30 5 1995
7 gretzky 35 5 1996
8 gretzky 40 5 1997
9 gretzky 45 5 1998
10 gretzky 50 5 1999
I want to expand the dataset such that when players end their career, they remain in the dataset. For example, Wayne Gretzky retired in 1999, but I want an entry for Gretzky in the dataset for all subsequent years with his final goal total. The end product would look something like this:
player goal_total goals year
1 gretzky 5 5 1990
2 gretzky 10 5 1991
3 gretzky 15 5 1992
4 gretzky 20 5 1993
5 gretzky 25 5 1994
6 gretzky 30 5 1995
7 gretzky 35 5 1996
8 gretzky 40 5 1997
9 gretzky 45 5 1998
10 gretzky 50 5 1999
11 gretzky 50 0 2000
12 gretzky 50 0 2001
13 gretzky 50 0 2002
...
and so on until 2019. Is there a simple way to do this?
We can achieve this with complete
and fill
from tidyr
library(dplyr)
library(tidyr)
toy_data %>%
group_by(player) %>%
complete(year = min(year):2019, fill = list(goals = 0)) %>%
fill(goal_total)
# player year goal_total goals
#1 gretzky 1990 5 5
#2 gretzky 1991 10 5
#3 gretzky 1992 15 5
#4 gretzky 1993 20 5
#5 gretzky 1994 25 5
#6 gretzky 1995 30 5
#7 gretzky 1996 35 5
#8 gretzky 1997 40 5
#9 gretzky 1998 45 5
#10 gretzky 1999 50 5
#11 gretzky 2000 50 0
#12 gretzky 2001 50 0
#....
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.