简体   繁体   English

Pandas DataFrame-添加包含“上一个”行的条件总和的列

[英]Pandas DataFrame - Add Column Containing Conditional Sum of “previous” Rows

I have a dataset of tennis match results as follows: 我有一个网球比赛结果的数据集,如下所示:

tennis_cols = ['Year','TourNo','MatchNo','Round','Winner','Loser']
tennis_rslts = [ [2018, 1, 1, 'QF', 'PlayerA', 'PlayerB']
                ,[2018, 1, 2, 'QF', 'PlayerC', 'PlayerD']
                ,[2018, 1, 3, 'QF', 'PlayerE', 'PlayerF']
                ,[2018, 1, 4, 'QF', 'PlayerG', 'PlayerH']
                ,[2018, 1, 5, 'SF', 'PlayerA', 'PlayerC']
                ,[2018, 1, 6, 'SF', 'PlayerE', 'PlayerG']
                ,[2018, 1, 7, 'F',  'PlayerA', 'PlayerE'] ]
dfTennis=pd.DataFrame(tennis_rslts,columns=tennis_cols)
dfTennis

    Year    TourNo  MatchNo Round   Winner     Loser    
0   2018    1       1       QF      PlayerA    PlayerB
1   2018    1       2       QF      PlayerC    PlayerD
2   2018    1       3       QF      PlayerE    PlayerF
3   2018    1       4       QF      PlayerG    PlayerH
4   2018    1       5       SF      PlayerA    PlayerC
5   2018    1       6       SF      PlayerE    PlayerG
6   2018    1       7       F       PlayerA    PlayerE

I want to add a column, WinsToDate, which contains the number of wins the winner of this match had before the current match, ie: 我想添加一列WinsToDate,其中包含此比赛的获胜者在当前比赛之前所获得的胜利数,即:

    Year    TourNo  MatchNo Round   Winner     Loser    WinsToDate  
0   2018    1       1       QF      PlayerA    PlayerB  0
1   2018    1       2       QF      PlayerC    PlayerD  0 
2   2018    1       3       QF      PlayerE    PlayerF  0
3   2018    1       4       QF      PlayerG    PlayerH  0
4   2018    1       5       SF      PlayerA    PlayerC  1  <-- PlayerA won MatchNo 1
5   2018    1       6       SF      PlayerE    PlayerG  1  <-- PlayerE won MatchNo 3
6   2018    1       7       F       PlayerA    PlayerE  2  <-- PlayerA won MatchNo 1 and 5

My real-world dataset is large enough that iterating through the dataset is too slow. 我的现实世界数据集足够大,以至于遍历数据集的速度太慢。 Any ideas how I achieve the result in an efficient manner? 有什么想法可以有效地实现结果吗?

Essentially I want to count the number of rows where the Winner matches the row being processed and the MatchNo is less than the current row being processed. 本质上,我想计算Winner与正在处理的行匹配并且MatchNo小于正在处理的当前行的行数。

** UPDATE ** I can get a count of the number of times the winner occurs in the Dataframe using: **更新**我可以使用以下方法来计算获胜者在数据框中出现的次数:

dfTennis['Count'] = list(map(lambda x : len(dfTennis[(dfTennis['Winner'] == x)]), dfTennis['Winner']))

But this counts all occurrences rather than all occurrences before the current row. 但这将计算所有事件,而不是当前行之前的所有事件。

Strangely I am going to answer my own question. 奇怪的是,我要回答自己的问题。

The code needed to compute the WinsToDate column is: 计算WinsToDate列所需的代码是:

dfTennis['WinsToDate'] = list(map(lambda x : len(dfTennis[(dfTennis['Winner'] == dfTennis.iloc[x]['Winner']) & 
                                                          (dfTennis['MatchNo'] < dfTennis.iloc[x]['MatchNo'])]), dfTennis.index.values))

By passing in the index value to the lambda function it meant that I could access data in both the Winner and MatchNo fields to apply the logic I required. 通过将索引值传递给lambda函数,这意味着我可以访问Winner和MatchNo字段中的数据以应用所需的逻辑。

Am welcome to hear any better solutions but this appears to work for my need. 欢迎听到任何更好的解决方案,但这似乎可以满足我的需要。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM