简体   繁体   中英

How to calculate customized aggregations after group by pandas

I have dataframe df like below

ID   COMMODITY_CODE   DELIVERY_TYPE  DAY   Window_start_time     case_qty     deliveries
6042.0      SCGR        Live         1.0    15:00                 15756.75    7.75
6042.0      SCGR        Live         1.0    18:00                 15787.75    5.75
6042.0      SCGR        Live         1.0    21:00                 10989.75    4.75
6042.0      SCGR        Live         2.0    15:00                 21025.25    9.00
6042.0      SCGR        Live         2.0    18:00                 16041.75    5.75

I want below output where i am grouping by ID, COMMODITY_CODE, DELIVERY_TYPE, DAY and Calculate below case_qty_ratio and dlvry_ratio like below

ID   COMMODITY_CODE   DELIVERY_TYPE  DAY  case_qty   deliveries dlvry_ratio case_qty_ratio
6042.0      SCGR        Live         1.0.  15756.75   7.75         0.42          0.37
6042.0      SCGR        Live         1.0.  15787.75   5.75.        0.31.         0.37
6042.0      SCGR        Live         1.0.  10989.75   4.75.        0.26.         0.25
6042.0      SCGR        Live         2.0.  21025.25   9.00.        0.61.         0.56
6042.0      SCGR        Live         2.0.  16041.75   5.75.        0.39          0.44

I tried below code using lambda function to aggregate this information

df.groupby(['ID','COMMODITY_CODE','DELIVERY_TYPE','DAY']  \
                        ,as_index=False) \
                        .agg( \
                             delivery_ratio=("deliveries",lambda x: x / x.sum()), \
                             case_ratio=(lambda x: x/ x.sum() ) / 

But this didn't work. Any help would be appreciated

Try this way instead:

df[['case_ratio', 'delivery_ratio']] = df.groupby(['ID','COMMODITY_CODE','DELIVERY_TYPE','DAY'], 
                                                   as_index=False)[['case_qty', 'deliveries']]\
                                          .transform(lambda x: x/x.sum())

Output:

       ID COMMODITY_CODE DELIVERY_TYPE  DAY Window_start_time  case_qty  deliveries  case_ratio   delivery_ratio
0  6042.0           SCGR          Live  1.0             15:00  15756.75        7.75     0.370449        0.424658
1  6042.0           SCGR          Live  1.0             18:00  15787.75        5.75     0.371177        0.315068
2  6042.0           SCGR          Live  1.0             21:00  10989.75        4.75     0.258374        0.260274
3  6042.0           SCGR          Live  2.0             15:00  21025.25        9.00     0.567223        0.610169
4  6042.0           SCGR          Live  2.0             18:00  16041.75        5.75     0.432777        0.389831

Similar to Scott's answer, but just transform('sum') and then divide:

cols = ['case_qty', 'deliveries']
df = df.join(df[cols].div(df.groupby(['ID','COMMODITY_CODE','DELIVERY_TYPE','DAY'])
                            [cols].transform('sum')
                         )
                     .add_suffix('_ratio')
            )

Output:

       ID COMMODITY_CODE DELIVERY_TYPE  DAY Window_start_time  case_qty  \
0  6042.0           SCGR          Live  1.0             15:00  15756.75   
1  6042.0           SCGR          Live  1.0             18:00  15787.75   
2  6042.0           SCGR          Live  1.0             21:00  10989.75   
3  6042.0           SCGR          Live  2.0             15:00  21025.25   
4  6042.0           SCGR          Live  2.0             18:00  16041.75   

   deliveries  case_qty_ratio  deliveries_ratio  
0        7.75        0.370449          0.424658  
1        5.75        0.371177          0.315068  
2        4.75        0.258374          0.260274  
3        9.00        0.567223          0.610169  
4        5.75        0.432777          0.389831  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM