简体   繁体   中英

adding column to dataframe based on count from another dataframe

I have a dataframe ranksdf containing player names , dates , and their ranking per the date. The date column is a parsed datetime object (maybe relevant for date comparison later):

player      date        ranking
A           20120601    1
B           20120601    2
C           20120601    3
A           20130601    1
B           20130601    2
C           20130601    3

What I want to do is to add a new column which counts tournament wins of each player until that date. the information on tournament wins comes from another dataframe called matchesdf :

t_name  t_date      w_name      round
X       20120101    A           F   
X       20120101    A           SF          
Y       20120201    B           F
Y       20120201    B           SF
Z       20130101    A           F
  • t_name = tournament name
  • t_date = date of the tournament
  • w_name = winner name
  • round = the round in the tournament. F = Final, SF = Semifinal

From the second dataframe I know when a specific player won a tournament at a give time by counting the rows where round equals F .

So what I want to do is to add a new column to ranksdf counting the tournament wins but only until ranksdf.date .

In pseudocode something like this: ranksdf['t_wins'] = ranksdf.apply(lambda x: matchesdf[(matchesdf['t_date'] < x['date']) & (matchesdf['w_name'] == x['player']) & (matchesdf['round'] == 'F')].count())

So, the constraints on looking up the info in matchesdf are the time (because I want to know only the wins until the time of the ranking in ranksdf ), the player name obviously, and the round (because tournament wins are defined by winning the Final).

The result should look like this:

player      date        ranking     t_wins
A           20120601    1           1
B           20120601    2           1
C           20120601    3           0
A           20130601    1           2
B           20130601    2           1
C           20130601    3           0

Thanks for helping me.

只需将axis = 1添加到您的apply函数中,它将起作用:

ranksdf["t_wins"]  = ranksdf.apply(lambda x: len(matchesdf[(matchesdf['t_date'] < x['date']) & (matchesdf['w_name'] == x['player']) & (matchesdf['round'] == 'F')]), axis =1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM