Pandas dataframe summing with multiple groupby

Question

I have the following dataframe:

df2 = pd.DataFrame({'season':[1,1,1,2,2,2,3,3],'value' : [-2, 3,1,5,8,6,7,5], 'avail':[3,3,3,8,8,4,25,25],'test2':[4,5,7,8,9,10,11,12]},index=['2020', '2020', '2020','2020', '2020', '2021', '2021', '2021']) 
df2.index=  pd.to_datetime(df2.index)  
df2.index = df2.index.year
print(df2)

      avail  season  test2  value
2020      3       1      4     -2
2020      3       1      5      3
2020      3       1      7      1
2020      8       2      8      5
2020      8       2      9      8
2021      4       2     10      6
2021     25       3     11      7
2021     25       3     12      5

I would like to compute efficiently for each year the sum of the 'avail' column. The difficulty here beeing to sum only one 'avail' value for each season. For instance for the year 2020 I want to sum 3+8 =11.

Expected result (column 'sum_avail'):

        avail  season  test2  value   sum_avail
2020      3       1      4     -2        11
2020      3       1      5      3        11
2020      3       1      7      1        11 
2020      8       2      8      5        11
2020      8       2      9      8        11
2021      4       2     10      6        29
2021     25       3     11      7        29
2021     25       3     12      5        29

Answer 1

IIUC, transform + set

df2.groupby(level=0).avail.transform(lambda x : sum(set(x)))
Out[220]: 
2020    11
2020    11
2020    11
2020    11
2020    11
2021    29
2021    29
2021    29
Name: avail, dtype: int64

Answer 2

You'll need groupby + transform + np.unique :

df2['sum_avail'] = (
     df2.groupby(level=0).avail.transform(lambda x: np.unique(x).sum()))

Or,

df2['sum_avail'] = df2.groupby(level=0).avail.transform('unique').apply(sum)

df2

      avail  season  test2  value  sum_avail
2020      3       1      4     -2         11
2020      3       1      5      3         11
2020      3       1      7      1         11
2020      8       2      8      5         11
2020      8       2      9      8         11
2021      4       2     10      6         29
2021     25       3     11      7         29
2021     25       3     12      5         29

Answer 3

Here's an approach which takes the first value in each index/season pair and then sums them up:

res = df2.groupby([df2.index, 'season'])['avail'].first().sum(level=0)
df2.join(res.rename('sum_avail'))

      season  value  avail  test2  sum_avail
2020       1     -2      3      4         11
2020       1      3      3      5         11
2020       1      1      3      7         11
2020       2      5      8      8         11
2020       2      8      8      9         11
2021       2      6      4     10         29
2021       3      7     25     11         29
2021       3      5     25     12         29

Pandas dataframe summing with multiple groupby

Question

3 answers

solution1
4 2018-06-11 14:48:58

solution2
3 ACCPTED 2018-06-11 14:49:05

solution3
2 2018-06-11 14:49:43

Pandas dataframe summing with multiple groupby

Question

3 answers

solution1 4 2018-06-11 14:48:58

solution2 3 ACCPTED 2018-06-11 14:49:05

solution3 2 2018-06-11 14:49:43

solution1
4 2018-06-11 14:48:58

solution2
3 ACCPTED 2018-06-11 14:49:05

solution3
2 2018-06-11 14:49:43