简体   繁体   中英

Update value for every row based on either of two previous columns

I am researching ATP Tour male tennis data. Currently, I have a Pandas dataframe that contains ~60,000 matches. Every row contains information / statistics about the match, split between the winner and the loser. I have sorted the dataframe on date. Currently I am trying to calculate the ELO-rating of both the winner and the loser for every match (thus every row). To calculate the ELO-rating, one needs the ELO-rating for both players in their previous match. Another difficulty arises, as the winner of the current match might have been a loser in his previous match. As a result, the 'winner_player_id' value of the current match might be in the 'loser_player_id' column for the previous match.

I am not sure how to efficiently select the previous ELO-ratings for both players per row, as this entails a search across multiple columns.

Every row includes the following columns:

array(['match_id', 'tourney_dates', 'round_order', 'tourney_name',
   'tourney_year_id', 'tourney_round_name', 'winner_player_id',
   'winner_slug', 'loser_player_id', 'loser_slug', 'elo_player_1', 'elo_player_2'])

Your time is appreciated!

One approach would be to sort each winner and loser in each row by player name/ID, so the order will be stable regardless of who wins/loses. Here's an example:

df.join(pd.DataFrame(
    np.sort(df[['winner_name', 'loser_name']].values, axis=1),
    columns=['name1', 'name2']))

df.head(10)

Output:

      winner_name         loser_name              name1          name2
0   Nicklas Kulti      Michael Stich      Michael Stich  Nicklas Kulti
1   Michael Stich        Jim Courier        Jim Courier  Michael Stich
2   Nicklas Kulti     Magnus Larsson     Magnus Larsson  Nicklas Kulti
3     Jim Courier      Martin Sinner        Jim Courier  Martin Sinner
4   Michael Stich        Jimmy Arias        Jimmy Arias  Michael Stich
5   Nicklas Kulti    Fabrice Santoro    Fabrice Santoro  Nicklas Kulti
6  Magnus Larsson      Patrik Kuhnen     Magnus Larsson  Patrik Kuhnen
7     Jim Courier      Paul Haarhuis        Jim Courier  Paul Haarhuis
8   Nicklas Kulti  Magnus Gustafsson  Magnus Gustafsson  Nicklas Kulti
9   Michael Stich        Gilad Bloom        Gilad Bloom  Michael Stich

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM