I want to calculate points for soccer teams. I have the points for each game, i get the cumsum for either home- or away points. I can't figure out how to get the total points for each team (home + away points)
This what i have so far:
df = pd.DataFrame([
["Gothenburg", "Malmo", 2018, 1, 1],
["Malmo","Gothenburg", 2018, 1, 1],
["Malmo", "Gothenburg", 2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Gothenburg", "Malmo" ,2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Gothenburg", "Malmo", 2018, 0, 3],
["Malmo", "Gothenburg", 2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Malmo", "Gothenburg", 2018, 0, 3],
[ "Malmo","Gothenburg", 2018, 1, 1],
[ "Malmo", "Gothenburg",2018, 0, 3],
])
df.columns = ['H_team', 'A_team', "Year", 'H_points', 'A_points']
# Cumulaive sum for home/ away team with shift 1 row
df["H_cumsum"] = df.groupby(['H_team', "Year"])['H_points'].transform(
lambda x: x.cumsum().shift())
df["A_cumsum"] = df.groupby(['A_team', "Year"])['A_points'].transform(
lambda x: x.cumsum().shift())
print(df)
H_team A_team Year H_points A_points H_cumsum A_cumsum
0 Gothenburg Malmo 2018 1 1 NaN NaN
1 Malmo Gothenburg 2018 1 1 NaN NaN
2 Malmo Gothenburg 2018 0 3 1.0 1.0
3 Gothenburg Malmo 2018 1 1 1.0 1.0
4 Gothenburg Malmo 2018 0 3 2.0 2.0
5 Gothenburg Malmo 2018 1 1 2.0 5.0
6 Gothenburg Malmo 2018 0 3 3.0 6.0
7 Malmo Gothenburg 2018 0 3 1.0 4.0
8 Gothenburg Malmo 2018 1 1 3.0 9.0
9 Malmo Gothenburg 2018 0 3 1.0 7.0
10 Malmo Gothenburg 2018 1 1 1.0 10.0
11 Malmo Gothenburg 2018 0 3 2.0 11.0
This table gives me cumulative home- and awaypoints for each team, shifted 1 row. But i need the total achived points from both home- and away games. H_cumsum and A_cumsum should add previous points from both home- and away games.
Desired output:
row 0: Malmo = NaN, Gothenburg = NaN
row 1: Gothenburg = 1, Malmo = 1
row 2: Malmo = 1 + 1 = 2, Gothenburg = 1 + 1 = 2
row 3: Gothenburg = 1 + 1 + 3 = 5, Malmo = 1 + 1 + 0 = 2
row 4: Gothenburg = 1 + 1 + 3 + 1 = 6, Malmo = 1 + 1 + 0 + 1 = 3
And so on...
Last row 11 should be:
H_cumsum (team Malmo) = 12 H_cumsum (team Gothenburg) = 15
This seemed to compute okay on my end. It's a little long-handed.
df.columns = ['H_team', 'A_team', "Year", 'H_points', 'A_points']
# H_team cumsum() for science.
df['H_cumsum'] = df[['H_team', 'H_points']].groupby(['H_team']).cumsum()
# A_team cumsum() for more science.
df['A_cumsum'] = df[['A_team', 'A_points']].groupby(['A_team']).cumsum()
# Creating a column for the sum of the two, or total points scored by either side.
df['T_sum'] = df['H_points'] + df['A_points']
# Creating the cumsum() column for T_sum
df['T_cumsum'] = df['T_sum'].cumsum()
print(df)
I found a solution, using stack but it's not a good one:
df = pd.DataFrame([
["Gothenburg", "Malmo", 2018, 1, 1],
["Malmo","Gothenburg", 2018, 1, 1],
["Malmo", "Gothenburg", 2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Gothenburg", "Malmo" ,2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Gothenburg", "Malmo", 2018, 0, 3],
["Malmo", "Gothenburg", 2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Malmo", "Gothenburg", 2018, 0, 3],
[ "Malmo","Gothenburg", 2018, 1, 1],
[ "Malmo", "Gothenburg",2018, 0, 3],
])
df.columns = [['Team', 'Team', "Year", 'Points', 'Points'],
['Home', 'Away', 'Year', 'Home', 'Away']]
d1 = df.stack()
total = d1.groupby('Team').Points.apply(lambda x: x.shift().cumsum())
df = d1.assign(Total=total).unstack()
print(df)
Points Team Year Total
Away Home Year Away Home Year Away Home Year Away Home Year
0 1.0 1.0 NaN Malmo Gothenburg NaN NaN NaN 2018.0 NaN NaN NaN
1 1.0 1.0 NaN Gothenburg Malmo NaN NaN NaN 2018.0 1.0 1.0 NaN
2 3.0 0.0 NaN Gothenburg Malmo NaN NaN NaN 2018.0 2.0 2.0 NaN
3 1.0 1.0 NaN Malmo Gothenburg NaN NaN NaN 2018.0 2.0 5.0 NaN
4 3.0 0.0 NaN Malmo Gothenburg NaN NaN NaN 2018.0 3.0 6.0 NaN
5 1.0 1.0 NaN Malmo Gothenburg NaN NaN NaN 2018.0 6.0 6.0 NaN
6 3.0 0.0 NaN Malmo Gothenburg NaN NaN NaN 2018.0 7.0 7.0 NaN
7 3.0 0.0 NaN Gothenburg Malmo NaN NaN NaN 2018.0 7.0 10.0 NaN
8 1.0 1.0 NaN Malmo Gothenburg NaN NaN NaN 2018.0 10.0 10.0 NaN
9 3.0 0.0 NaN Gothenburg Malmo NaN NaN NaN 2018.0 11.0 11.0 NaN
10 1.0 1.0 NaN Gothenburg Malmo NaN NaN NaN 2018.0 14.0 11.0 NaN
11 3.0 0.0 NaN Gothenburg Malmo NaN NaN NaN 2018.0 15.0 12.0 NaN
The points under Total/ Away and Total/ Home are correct. However, the table becomes very difficult to overview with all extra unnecessary columns. (I have another 10 columns for each row not displayed in this example so it's a really mess.)
The desired output is:
H_team A_team Year H_points A_points H_cumsum A_cumsum
0 Gothenburg Malmo 2018 1 1 NaN NaN
1 Malmo Gothenburg 2018 1 1 1.0 1.0
2 Malmo Gothenburg 2018 0 3 2.0 2.0
3 Gothenburg Malmo 2018 1 1 5.0 2.0
4 Gothenburg Malmo 2018 0 3 6.0 3.0
5 Gothenburg Malmo 2018 1 1 6.0 6.0
6 Gothenburg Malmo 2018 0 3 7.0 7.0
7 Malmo Gothenburg 2018 0 3 10.0 7.0
8 Gothenburg Malmo 2018 1 1 10.0 10.0
9 Malmo Gothenburg 2018 0 3 11.0 11.0
10 Malmo Gothenburg 2018 1 1 11.0 14.0
11 Malmo Gothenburg 2018 0 3 12.0 15.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.