在Pandas数据框中进行累计总计的最快方法

Question

I've got a pandas dataframe of golfers' round scores going back to 2003 (approx 300000 rows). 我有一个熊猫数据框，其中高尔夫球手的回合得分可以追溯到2003年（大约300000行）。 It looks something like this: 看起来像这样：

Date----Golfer---Tournament-----Score---Player Total Rounds Played 日期----高尔夫球手---比赛-----比分---球员总出手次数

2008-01-01---Tiger Woods----Invented Tournament R1---72---50 2008-01-01 ---泰格·伍兹----发明比赛R1 --- 72 --- 50

2008-01-01---Phil Mickelson----Invented Tournament R1---73---108 2008-01-01 --- Phil Mickelson ----发明比赛R1 --- 73 --- 108

I want the 'Player Total Rounds Played' column to be a running total of the number of rounds (ie instance in the dataframe) that a player has played up to that date. 我希望“已播放的玩家总回合数”列为该玩家截至该日为止已回合数（即数据帧中的实例）的运行总数。 Is there a quick way of doing it? 有快速的方法吗？ My current solution (basically using iterrows and then a one-line function) works fine but will take approx 11hrs to run. 我当前的解决方案（基本上使用迭代，然后使用单行函数）可以正常工作，但大约需要11个小时才能运行。

Thanks, 谢谢，

Tom 汤姆

Answer 1

Here is one way: 这是一种方法：

df = df.sort_values('Date')
df['Rounds CumSum'] = df.groupby('Golfer')['Rounds'].cumsum()

For example: 例如：

import pandas as pd

df = pd.DataFrame([['A', 70, 50],
                   ['B', 72, 55],
                   ['A', 73, 45],
                   ['A', 71, 60],
                   ['B', 74, 55],
                   ['A', 72, 65]],
                  columns=['Golfer', 'Rounds', 'Played'])

df['Rounds CumSum'] = df.groupby('Golfer')['Rounds'].cumsum()

#   Golfer  Rounds  Played  Rounds CumSum
# 0      A      70      50             70
# 1      B      72      55             72
# 2      A      73      45            143
# 3      A      71      60            214
# 4      B      74      55            146
# 5      A      72      65            286

在Pandas数据框中进行累计总计的最快方法

问题描述

1 个解决方案

解决方案1
2 已采纳 2018-02-13 14:57:11

在Pandas数据框中进行累计总计的最快方法

问题描述

1 个解决方案

解决方案1 2 已采纳 2018-02-13 14:57:11

解决方案1
2 已采纳 2018-02-13 14:57:11