[英]Pandas - conditional cumulative sum of two columns
我想计算足球队的积分。 我得到了每场比赛的得分,我得到了主场或客场积分。 我无法弄清楚如何获得每支球队的总积分(主场+客场积分)
这就是我到目前为止:
df = pd.DataFrame([
["Gothenburg", "Malmo", 2018, 1, 1],
["Malmo","Gothenburg", 2018, 1, 1],
["Malmo", "Gothenburg", 2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Gothenburg", "Malmo" ,2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Gothenburg", "Malmo", 2018, 0, 3],
["Malmo", "Gothenburg", 2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Malmo", "Gothenburg", 2018, 0, 3],
[ "Malmo","Gothenburg", 2018, 1, 1],
[ "Malmo", "Gothenburg",2018, 0, 3],
])
df.columns = ['H_team', 'A_team', "Year", 'H_points', 'A_points']
# Cumulaive sum for home/ away team with shift 1 row
df["H_cumsum"] = df.groupby(['H_team', "Year"])['H_points'].transform(
lambda x: x.cumsum().shift())
df["A_cumsum"] = df.groupby(['A_team', "Year"])['A_points'].transform(
lambda x: x.cumsum().shift())
print(df)
H_team A_team Year H_points A_points H_cumsum A_cumsum
0 Gothenburg Malmo 2018 1 1 NaN NaN
1 Malmo Gothenburg 2018 1 1 NaN NaN
2 Malmo Gothenburg 2018 0 3 1.0 1.0
3 Gothenburg Malmo 2018 1 1 1.0 1.0
4 Gothenburg Malmo 2018 0 3 2.0 2.0
5 Gothenburg Malmo 2018 1 1 2.0 5.0
6 Gothenburg Malmo 2018 0 3 3.0 6.0
7 Malmo Gothenburg 2018 0 3 1.0 4.0
8 Gothenburg Malmo 2018 1 1 3.0 9.0
9 Malmo Gothenburg 2018 0 3 1.0 7.0
10 Malmo Gothenburg 2018 1 1 1.0 10.0
11 Malmo Gothenburg 2018 0 3 2.0 11.0
这张表给出了每支球队的累积主场和客场,换了1排。 但是我需要来自本垒打和客场比赛的全部积分。 H_cumsum和A_cumsum应该添加主场和客场比赛的先前积分。
期望的输出:
row 0: Malmo = NaN, Gothenburg = NaN
row 1: Gothenburg = 1, Malmo = 1
row 2: Malmo = 1 + 1 = 2, Gothenburg = 1 + 1 = 2
row 3: Gothenburg = 1 + 1 + 3 = 5, Malmo = 1 + 1 + 0 = 2
row 4: Gothenburg = 1 + 1 + 3 + 1 = 6, Malmo = 1 + 1 + 0 + 1 = 3
And so on...
最后一行11应该是:
H_cumsum (team Malmo) = 12 H_cumsum (team Gothenburg) = 15
这似乎在我的结束时计算好了。 这有点长手。
df.columns = ['H_team', 'A_team', "Year", 'H_points', 'A_points']
# H_team cumsum() for science.
df['H_cumsum'] = df[['H_team', 'H_points']].groupby(['H_team']).cumsum()
# A_team cumsum() for more science.
df['A_cumsum'] = df[['A_team', 'A_points']].groupby(['A_team']).cumsum()
# Creating a column for the sum of the two, or total points scored by either side.
df['T_sum'] = df['H_points'] + df['A_points']
# Creating the cumsum() column for T_sum
df['T_cumsum'] = df['T_sum'].cumsum()
print(df)
我找到了一个解决方案,使用堆栈,但它不是一个好的解决方案:
df = pd.DataFrame([
["Gothenburg", "Malmo", 2018, 1, 1],
["Malmo","Gothenburg", 2018, 1, 1],
["Malmo", "Gothenburg", 2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Gothenburg", "Malmo" ,2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Gothenburg", "Malmo", 2018, 0, 3],
["Malmo", "Gothenburg", 2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Malmo", "Gothenburg", 2018, 0, 3],
[ "Malmo","Gothenburg", 2018, 1, 1],
[ "Malmo", "Gothenburg",2018, 0, 3],
])
df.columns = [['Team', 'Team', "Year", 'Points', 'Points'],
['Home', 'Away', 'Year', 'Home', 'Away']]
d1 = df.stack()
total = d1.groupby('Team').Points.apply(lambda x: x.shift().cumsum())
df = d1.assign(Total=total).unstack()
print(df)
Points Team Year Total
Away Home Year Away Home Year Away Home Year Away Home Year
0 1.0 1.0 NaN Malmo Gothenburg NaN NaN NaN 2018.0 NaN NaN NaN
1 1.0 1.0 NaN Gothenburg Malmo NaN NaN NaN 2018.0 1.0 1.0 NaN
2 3.0 0.0 NaN Gothenburg Malmo NaN NaN NaN 2018.0 2.0 2.0 NaN
3 1.0 1.0 NaN Malmo Gothenburg NaN NaN NaN 2018.0 2.0 5.0 NaN
4 3.0 0.0 NaN Malmo Gothenburg NaN NaN NaN 2018.0 3.0 6.0 NaN
5 1.0 1.0 NaN Malmo Gothenburg NaN NaN NaN 2018.0 6.0 6.0 NaN
6 3.0 0.0 NaN Malmo Gothenburg NaN NaN NaN 2018.0 7.0 7.0 NaN
7 3.0 0.0 NaN Gothenburg Malmo NaN NaN NaN 2018.0 7.0 10.0 NaN
8 1.0 1.0 NaN Malmo Gothenburg NaN NaN NaN 2018.0 10.0 10.0 NaN
9 3.0 0.0 NaN Gothenburg Malmo NaN NaN NaN 2018.0 11.0 11.0 NaN
10 1.0 1.0 NaN Gothenburg Malmo NaN NaN NaN 2018.0 14.0 11.0 NaN
11 3.0 0.0 NaN Gothenburg Malmo NaN NaN NaN 2018.0 15.0 12.0 NaN
Total / Away和Total / Home下的积分是正确的。 但是,对于所有额外不必要的列,该表变得非常难以概述。 (我在这个例子中没有显示每行的另外10列,所以这真是一团糟。)
所需的输出是:
H_team A_team Year H_points A_points H_cumsum A_cumsum
0 Gothenburg Malmo 2018 1 1 NaN NaN
1 Malmo Gothenburg 2018 1 1 1.0 1.0
2 Malmo Gothenburg 2018 0 3 2.0 2.0
3 Gothenburg Malmo 2018 1 1 5.0 2.0
4 Gothenburg Malmo 2018 0 3 6.0 3.0
5 Gothenburg Malmo 2018 1 1 6.0 6.0
6 Gothenburg Malmo 2018 0 3 7.0 7.0
7 Malmo Gothenburg 2018 0 3 10.0 7.0
8 Gothenburg Malmo 2018 1 1 10.0 10.0
9 Malmo Gothenburg 2018 0 3 11.0 11.0
10 Malmo Gothenburg 2018 1 1 11.0 14.0
11 Malmo Gothenburg 2018 0 3 12.0 15.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.