简体   繁体   中英

Pandas - conditional cumulative sum of two columns

I want to calculate points for soccer teams. I have the points for each game, i get the cumsum for either home- or away points. I can't figure out how to get the total points for each team (home + away points)

This what i have so far:

  df  = pd.DataFrame([
["Gothenburg", "Malmo", 2018, 1, 1],
["Malmo","Gothenburg",  2018, 1, 1],
["Malmo", "Gothenburg", 2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Gothenburg", "Malmo" ,2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Gothenburg", "Malmo", 2018, 0, 3],
["Malmo", "Gothenburg", 2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Malmo", "Gothenburg", 2018, 0, 3],
[ "Malmo","Gothenburg", 2018, 1, 1],
[ "Malmo", "Gothenburg",2018, 0, 3],
])


df.columns = ['H_team', 'A_team', "Year", 'H_points', 'A_points']

# Cumulaive sum for home/ away team with shift 1 row
df["H_cumsum"] = df.groupby(['H_team', "Year"])['H_points'].transform(
                             lambda x: x.cumsum().shift())
df["A_cumsum"] = df.groupby(['A_team', "Year"])['A_points'].transform(
                             lambda x: x.cumsum().shift())

print(df)

    H_team      A_team  Year  H_points  A_points  H_cumsum  A_cumsum
0   Gothenburg       Malmo  2018         1         1       NaN       NaN
1        Malmo  Gothenburg  2018         1         1       NaN       NaN
2        Malmo  Gothenburg  2018         0         3       1.0       1.0
3   Gothenburg       Malmo  2018         1         1       1.0       1.0
4   Gothenburg       Malmo  2018         0         3       2.0       2.0
5   Gothenburg       Malmo  2018         1         1       2.0       5.0
6   Gothenburg       Malmo  2018         0         3       3.0       6.0
7        Malmo  Gothenburg  2018         0         3       1.0       4.0
8   Gothenburg       Malmo  2018         1         1       3.0       9.0
9        Malmo  Gothenburg  2018         0         3       1.0       7.0
10       Malmo  Gothenburg  2018         1         1       1.0      10.0
11       Malmo  Gothenburg  2018         0         3       2.0      11.0

This table gives me cumulative home- and awaypoints for each team, shifted 1 row. But i need the total achived points from both home- and away games. H_cumsum and A_cumsum should add previous points from both home- and away games.

Desired output:

row 0: Malmo = NaN, Gothenburg = NaN
row 1: Gothenburg = 1, Malmo = 1
row 2: Malmo = 1 + 1 = 2, Gothenburg = 1 + 1 = 2
row 3: Gothenburg = 1 + 1 + 3 = 5, Malmo = 1 + 1 + 0 = 2
row 4: Gothenburg = 1 + 1 + 3 + 1 = 6, Malmo = 1 + 1 + 0 + 1 = 3
And so on...

Last row 11 should be:

H_cumsum (team Malmo) = 12     H_cumsum (team Gothenburg) = 15  

This seemed to compute okay on my end. It's a little long-handed.

df.columns = ['H_team', 'A_team', "Year", 'H_points', 'A_points']
# H_team cumsum() for science.

df['H_cumsum'] = df[['H_team', 'H_points']].groupby(['H_team']).cumsum()
# A_team cumsum() for more science.

df['A_cumsum'] = df[['A_team', 'A_points']].groupby(['A_team']).cumsum()
# Creating a column for the sum of the two, or total points scored by either side.

df['T_sum'] = df['H_points'] + df['A_points']

# Creating the cumsum() column for T_sum
df['T_cumsum'] = df['T_sum'].cumsum()

print(df)

I found a solution, using stack but it's not a good one:

df  = pd.DataFrame([
["Gothenburg", "Malmo", 2018, 1, 1],
["Malmo","Gothenburg",  2018, 1, 1],
["Malmo", "Gothenburg", 2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Gothenburg", "Malmo" ,2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Gothenburg", "Malmo", 2018, 0, 3],
["Malmo", "Gothenburg", 2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Malmo", "Gothenburg", 2018, 0, 3],
[ "Malmo","Gothenburg", 2018, 1, 1],
[ "Malmo", "Gothenburg",2018, 0, 3],
])


df.columns = [['Team', 'Team', "Year", 'Points', 'Points'],
    ['Home', 'Away', 'Year', 'Home', 'Away']]

d1 = df.stack()
total = d1.groupby('Team').Points.apply(lambda x: x.shift().cumsum())
df = d1.assign(Total=total).unstack()

print(df)

   Points                  Team                  Year              Total           
     Away Home Year        Away        Home Year Away Home    Year  Away  Home Year
0     1.0  1.0  NaN       Malmo  Gothenburg  NaN  NaN  NaN  2018.0   NaN   NaN  NaN
1     1.0  1.0  NaN  Gothenburg       Malmo  NaN  NaN  NaN  2018.0   1.0   1.0  NaN
2     3.0  0.0  NaN  Gothenburg       Malmo  NaN  NaN  NaN  2018.0   2.0   2.0  NaN
3     1.0  1.0  NaN       Malmo  Gothenburg  NaN  NaN  NaN  2018.0   2.0   5.0  NaN
4     3.0  0.0  NaN       Malmo  Gothenburg  NaN  NaN  NaN  2018.0   3.0   6.0  NaN
5     1.0  1.0  NaN       Malmo  Gothenburg  NaN  NaN  NaN  2018.0   6.0   6.0  NaN
6     3.0  0.0  NaN       Malmo  Gothenburg  NaN  NaN  NaN  2018.0   7.0   7.0  NaN
7     3.0  0.0  NaN  Gothenburg       Malmo  NaN  NaN  NaN  2018.0   7.0  10.0  NaN
8     1.0  1.0  NaN       Malmo  Gothenburg  NaN  NaN  NaN  2018.0  10.0  10.0  NaN
9     3.0  0.0  NaN  Gothenburg       Malmo  NaN  NaN  NaN  2018.0  11.0  11.0  NaN
10    1.0  1.0  NaN  Gothenburg       Malmo  NaN  NaN  NaN  2018.0  14.0  11.0  NaN
11    3.0  0.0  NaN  Gothenburg       Malmo  NaN  NaN  NaN  2018.0  15.0  12.0  NaN

The points under Total/ Away and Total/ Home are correct. However, the table becomes very difficult to overview with all extra unnecessary columns. (I have another 10 columns for each row not displayed in this example so it's a really mess.)

The desired output is:

        H_team      A_team  Year  H_points  A_points  H_cumsum  A_cumsum
0   Gothenburg       Malmo  2018         1         1       NaN       NaN
1        Malmo  Gothenburg  2018         1         1       1.0       1.0
2        Malmo  Gothenburg  2018         0         3       2.0       2.0
3   Gothenburg       Malmo  2018         1         1       5.0       2.0
4   Gothenburg       Malmo  2018         0         3       6.0       3.0
5   Gothenburg       Malmo  2018         1         1       6.0       6.0
6   Gothenburg       Malmo  2018         0         3       7.0       7.0
7        Malmo  Gothenburg  2018         0         3       10.0      7.0
8   Gothenburg       Malmo  2018         1         1       10.0      10.0
9        Malmo  Gothenburg  2018         0         3       11.0      11.0
10       Malmo  Gothenburg  2018         1         1       11.0      14.0
11       Malmo  Gothenburg  2018         0         3       12.0      15.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM