I have the following dataframe:
df2 = pd.DataFrame({'season':[1,1,1,2,2,2,3,3],'value' : [-2, 3,1,5,8,6,7,5], 'avail':[3,3,3,8,8,4,25,25],'test2':[4,5,7,8,9,10,11,12]},index=['2020', '2020', '2020','2020', '2020', '2021', '2021', '2021'])
df2.index= pd.to_datetime(df2.index)
df2.index = df2.index.year
print(df2)
avail season test2 value
2020 3 1 4 -2
2020 3 1 5 3
2020 3 1 7 1
2020 8 2 8 5
2020 8 2 9 8
2021 4 2 10 6
2021 25 3 11 7
2021 25 3 12 5
I would like to compute efficiently for each year the sum of the 'avail' column. The difficulty here beeing to sum only one 'avail' value for each season. For instance for the year 2020 I want to sum 3+8 =11.
Expected result (column 'sum_avail'):
avail season test2 value sum_avail
2020 3 1 4 -2 11
2020 3 1 5 3 11
2020 3 1 7 1 11
2020 8 2 8 5 11
2020 8 2 9 8 11
2021 4 2 10 6 29
2021 25 3 11 7 29
2021 25 3 12 5 29
IIUC, transform
+ set
df2.groupby(level=0).avail.transform(lambda x : sum(set(x)))
Out[220]:
2020 11
2020 11
2020 11
2020 11
2020 11
2021 29
2021 29
2021 29
Name: avail, dtype: int64
You'll need groupby
+ transform
+ np.unique
:
df2['sum_avail'] = (
df2.groupby(level=0).avail.transform(lambda x: np.unique(x).sum()))
Or,
df2['sum_avail'] = df2.groupby(level=0).avail.transform('unique').apply(sum)
df2
avail season test2 value sum_avail
2020 3 1 4 -2 11
2020 3 1 5 3 11
2020 3 1 7 1 11
2020 8 2 8 5 11
2020 8 2 9 8 11
2021 4 2 10 6 29
2021 25 3 11 7 29
2021 25 3 12 5 29
Here's an approach which takes the first value in each index/season pair and then sums them up:
res = df2.groupby([df2.index, 'season'])['avail'].first().sum(level=0)
df2.join(res.rename('sum_avail'))
season value avail test2 sum_avail
2020 1 -2 3 4 11
2020 1 3 3 5 11
2020 1 1 3 7 11
2020 2 5 8 8 11
2020 2 8 8 9 11
2021 2 6 4 10 29
2021 3 7 25 11 29
2021 3 5 25 12 29
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.