have some nba data that looks like this -
>head(rebs)
game_id a1 a2 a3 a4 a5 h1 h2 h3 h4
1 21800001 Dario Saric Robert Covington Joel Embiid Markelle Fultz Ben Simmons Jayson Tatum Gordon Hayward Al Horford Jaylen Brown
2 21800001 Dario Saric Robert Covington Joel Embiid Markelle Fultz Ben Simmons Jayson Tatum Gordon Hayward Al Horford Jaylen Brown
3 21800001 Dario Saric Robert Covington Joel Embiid Markelle Fultz Ben Simmons Jayson Tatum Gordon Hayward Al Horford Jaylen Brown
4 21800001 Dario Saric Robert Covington Joel Embiid Markelle Fultz Ben Simmons Jayson Tatum Gordon Hayward Al Horford Jaylen Brown
5 21800001 Dario Saric Robert Covington Joel Embiid Markelle Fultz Ben Simmons Jayson Tatum Gordon Hayward Al Horford Jaylen Brown
6 21800001 Dario Saric Robert Covington Joel Embiid Markelle Fultz Ben Simmons Jayson Tatum Gordon Hayward Al Horford Jaylen Brown
h5 player team event_type type reb
1 Kyrie Irving start of period start of period 0
2 Kyrie Irving Al Horford PHI jump ball jump ball 0
3 Kyrie Irving Robert Covington PHI miss Jump Shot 0
4 Kyrie Irving rebound team rebound 0
5 Kyrie Irving Jayson Tatum BOS miss Jump Shot 0
6 Kyrie Irving Dario Saric PHI rebound rebound defensive 1
game_id is the id of the game being played. there's a full season of data so there's many different games in this set.
this is NBA data on the play by play level. a1:a5 is away team players currently on the floor, h1:h5 home team players currently on floor.
player is the name of the player who made the relevant play being described in that row
team is the team of the player who made the relevant play being described in that row
reb is a binary, with 1 meaning that a rebound was made, and 0 being everything else. So, the 6th play tracked in this data was a rebound made by Dario Saric (Philadelphia).
I want to find the number of rebounds each player's team made while that player was on the floor, grouped at the game level. One thing that makes this difficult is that throughout the dataset, players will move all throughout a1:h5, ie in this first game Dario Saric is later listed under a4 and a5. So, it's basically random where a player will be listed in the a1-h5 lineup while they're on the floor (except that away team is all a1:5, home team is h1:5).
here's what i used to find rebounds by a player, grouped by each game:
library(dplyr)
rebs %>%
group_by(game_id, player) %>%
summarize(rebs = sum(reb))
I'm unsure of how to find the number of rebounds a team had while each player was on the floor though. Eg. in the 6th play example, I would want that to count towards all 5 of the philadelphia players currently on the floor, not just Dario Saric.
Trying to use dplyr to do this, but not sure if it's possible. I'm trying to use group_by(game_id, team) and then doing an %in% across a1:h5, but nothing is clicking. Any help greatly appreciated!
Using tidyverse
you could try the following. This may not be the most efficient method.
First would filter for reb == 1
if only interested in looking at the rebound data, and ignore the rest of the plays available.
Would then assign a number for each of the rebound plays.
You can pivot_longer
to put your player names on the floor for each play into long format. This will also separate your "home" vs. "away" players, so you can give credit for the same team's players. Perhaps you could use team
though this was missing for other plays.
If you group_by
game_id
, whether home vs. away, and the play number, you can count up teammate rebounds, checking if the player making the rebound is %in%
other players (sharing home vs. away values).
Then you can group_by
each team player and sum these rebounds.
library(tidyverse)
rebs %>%
filter(reb == 1) %>%
mutate(play_number = row_number()) %>%
pivot_longer(a1:h5, names_to = c("home_away", "num"), values_to = "team_player", names_pattern = "(a|h)(\\d)") %>%
group_by(game_id, home_away, play_number) %>%
mutate(teammate_reb = ifelse(player %in% team_player, 1, 0)) %>%
group_by(game_id, team_player) %>%
summarise(reb_total = sum(teammate_reb))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.