pandas sort values within each group after groupby sum and get the percentage of values after using cumsum

Question

I have the following df ,

amount    id    year_month
20        10    201903
20        10    201903
50        20    201903
10        20    201903
 5        30    201903
 5        40    201903
30        50    201904
10        60    201904
10        70    201904
 5        80    201904

I want to groupby id and year_month and get the sum of amount first,

df_1 = df.groupby(['id', 'year_month'], as_index=False)['amount'].sum()

then divide this sum of amount by the amount total of year_month groupby ,

df_1['pct']=df_1['amount'].div(df_1.groupby('year_month')['amount'].transform('sum')).mul(100).round(2)

amount    id    year_month  pct
40        10    201903      36.36
60        20    201903      54.55
 5        30    201903      4.55
 5        40    201903      4.55
30        50    201904      54.55
10        60    201904      18.18
10        70    201904      18.18
 5        80    201904      9.09

I want to first sort pct within each year_month (eg 201903 ) in descending order; then calculate the percentage of id s whose cum sum pct is less than or equal to 80 within each year_month ; I am wondering whats the best way to do this and the result will look like (using year_month values as headers);

201903    201904
25%       50%

Answer 1

Function groupby by default sorting by grouping column, so sort_values should be omit. Then use custom lambda function with cumulative sum, compare by Series.le and for percentage of True s use mean , last convert Series to one column DataFrame by Series.to_frame with DataFrame.T for transpose:

df_2 = (df_1.groupby('year_month')['pct']
            .apply(lambda x: x.cumsum().le(80).mean())
            .mul(100)
            .to_frame(0)
            .T
            .astype(int))

print (df_2)
year_month  201903  201904
0               25      50

pandas sort values within each group after groupby sum and get the percentage of values after using cumsum

Question

1 answers

solution1
1 ACCPTED 2019-07-09 11:08:41

pandas sort values within each group after groupby sum and get the percentage of values after using cumsum

Question

1 answers

solution1 1 ACCPTED 2019-07-09 11:08:41

solution1
1 ACCPTED 2019-07-09 11:08:41