I have the following df
,
amount id year_month
20 10 201903
20 10 201903
50 20 201903
10 20 201903
5 30 201903
5 40 201903
30 50 201904
10 60 201904
10 70 201904
5 80 201904
I want to groupby
id
and year_month
and get the sum
of amount
first,
df_1 = df.groupby(['id', 'year_month'], as_index=False)['amount'].sum()
then divide this sum of amount
by the amount
total of year_month
groupby
,
df_1['pct']=df_1['amount'].div(df_1.groupby('year_month')['amount'].transform('sum')).mul(100).round(2)
amount id year_month pct
40 10 201903 36.36
60 20 201903 54.55
5 30 201903 4.55
5 40 201903 4.55
30 50 201904 54.55
10 60 201904 18.18
10 70 201904 18.18
5 80 201904 9.09
I want to first sort pct
within each year_month
(eg 201903
) in descending order; then calculate the percentage of id
s whose cum sum pct
is less than or equal to 80
within each year_month
; I am wondering whats the best way to do this and the result will look like (using year_month
values as headers);
201903 201904
25% 50%
Function groupby
by default sorting by grouping column, so sort_values
should be omit. Then use custom lambda function with cumulative sum, compare by Series.le
and for percentage of True
s use mean
, last convert Series
to one column DataFrame
by Series.to_frame
with DataFrame.T
for transpose:
df_2 = (df_1.groupby('year_month')['pct']
.apply(lambda x: x.cumsum().le(80).mean())
.mul(100)
.to_frame(0)
.T
.astype(int))
print (df_2)
year_month 201903 201904
0 25 50
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.