[英]Fastest way to do cumulative totals in Pandas dataframe
I've got a pandas dataframe of golfers' round scores going back to 2003 (approx 300000 rows). 我有一个熊猫数据框,其中高尔夫球手的回合得分可以追溯到2003年(大约300000行)。 It looks something like this:
看起来像这样:
Date----Golfer---Tournament-----Score---Player Total Rounds Played 日期----高尔夫球手---比赛-----比分---球员总出手次数
2008-01-01---Tiger Woods----Invented Tournament R1---72---50 2008-01-01 ---泰格·伍兹----发明比赛R1 --- 72 --- 50
2008-01-01---Phil Mickelson----Invented Tournament R1---73---108 2008-01-01 --- Phil Mickelson ----发明比赛R1 --- 73 --- 108
I want the 'Player Total Rounds Played' column to be a running total of the number of rounds (ie instance in the dataframe) that a player has played up to that date. 我希望“已播放的玩家总回合数”列为该玩家截至该日为止已回合数(即数据帧中的实例)的运行总数。 Is there a quick way of doing it?
有快速的方法吗? My current solution (basically using iterrows and then a one-line function) works fine but will take approx 11hrs to run.
我当前的解决方案(基本上使用迭代,然后使用单行函数)可以正常工作,但大约需要11个小时才能运行。
Thanks, 谢谢,
Tom 汤姆
Here is one way: 这是一种方法:
df = df.sort_values('Date')
df['Rounds CumSum'] = df.groupby('Golfer')['Rounds'].cumsum()
For example: 例如:
import pandas as pd
df = pd.DataFrame([['A', 70, 50],
['B', 72, 55],
['A', 73, 45],
['A', 71, 60],
['B', 74, 55],
['A', 72, 65]],
columns=['Golfer', 'Rounds', 'Played'])
df['Rounds CumSum'] = df.groupby('Golfer')['Rounds'].cumsum()
# Golfer Rounds Played Rounds CumSum
# 0 A 70 50 70
# 1 B 72 55 72
# 2 A 73 45 143
# 3 A 71 60 214
# 4 B 74 55 146
# 5 A 72 65 286
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.