adding column to dataframe based on count from another dataframe

Question

I have a dataframe ranksdf containing player names , dates , and their ranking per the date. The date column is a parsed datetime object (maybe relevant for date comparison later):

player      date        ranking
A           20120601    1
B           20120601    2
C           20120601    3
A           20130601    1
B           20130601    2
C           20130601    3

What I want to do is to add a new column which counts tournament wins of each player until that date. the information on tournament wins comes from another dataframe called matchesdf :

t_name  t_date      w_name      round
X       20120101    A           F   
X       20120101    A           SF          
Y       20120201    B           F
Y       20120201    B           SF
Z       20130101    A           F

t_name = tournament name
t_date = date of the tournament
w_name = winner name
round = the round in the tournament. F = Final, SF = Semifinal

From the second dataframe I know when a specific player won a tournament at a give time by counting the rows where round equals F .

So what I want to do is to add a new column to ranksdf counting the tournament wins but only until ranksdf.date .

In pseudocode something like this: ranksdf['t_wins'] = ranksdf.apply(lambda x: matchesdf[(matchesdf['t_date'] < x['date']) & (matchesdf['w_name'] == x['player']) & (matchesdf['round'] == 'F')].count())

So, the constraints on looking up the info in matchesdf are the time (because I want to know only the wins until the time of the ranking in ranksdf ), the player name obviously, and the round (because tournament wins are defined by winning the Final).

The result should look like this:

player      date        ranking     t_wins
A           20120601    1           1
B           20120601    2           1
C           20120601    3           0
A           20130601    1           2
B           20130601    2           1
C           20130601    3           0

Thanks for helping me.

Answer 1

只需将axis = 1添加到您的apply函数中，它将起作用：

ranksdf["t_wins"]  = ranksdf.apply(lambda x: len(matchesdf[(matchesdf['t_date'] < x['date']) & (matchesdf['w_name'] == x['player']) & (matchesdf['round'] == 'F')]), axis =1)

adding column to dataframe based on count from another dataframe

Question

1 answers

solution1
1 ACCPTED 2015-08-02 09:01:13

adding column to dataframe based on count from another dataframe

Question

1 answers

solution1 1 ACCPTED 2015-08-02 09:01:13

solution1
1 ACCPTED 2015-08-02 09:01:13