简体   繁体   中英

Apply function to grouped data counts in pandas

>>> new_confirmIOC.groupby(['ErrorCode','ResponseType']).OrderID.count()
ErrorCode  ResponseType        
0          CANCEL_ORDER_CONFIRM    80
           TRADE_CONFIRM           31
1          CANCEL_ORDER_CONFIRM    80
           TRADE_CONFIRM           31

How do I add percentage of total eg- 80/111, 31/111 for ErrorCode 0 and so on

I tried

new_confirmIOC.groupby(['ErrorCode','ResponseType']).OrderID.count().apply(lambda x: x / x.sum())

But it gives me

ErrorCode  ResponseType        
0          CANCEL_ORDER_CONFIRM    1
           TRADE_CONFIRM           1
1          CANCEL_ORDER_CONFIRM    1
           TRADE_CONFIRM           1
Name: OrderID, dtype: int64

I think you need groupby by first level and divide by sum :

df = new_confirmIOC.groupby(['ErrorCode','ResponseType']).OrderID.count()
df = df.groupby(level='ErrorCode').apply(lambda x: x / x.sum())
print (df)
ErrorCode  ResponseType        
0          CANCEL_ORDER_CONFIRM    0.720721
           TRADE_CONFIRM           0.279279
1          CANCEL_ORDER_CONFIRM    0.720721
           TRADE_CONFIRM           0.279279
Name: val, dtype: float64

Another solution with transform :

df = df.div(df.groupby(level='ErrorCode').transform('sum'))
print (df)
ErrorCode  ResponseType        
0          CANCEL_ORDER_CONFIRM    0.720721
           TRADE_CONFIRM           0.279279
1          CANCEL_ORDER_CONFIRM    0.720721
           TRADE_CONFIRM           0.279279
Name: val, dtype: float64

Thank you FLab for comment:

The result of .count is a Series, so the apply function would operate element by element. (not on the entire column as it would for a pandas DataFrame).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM