简体   繁体   中英

Pandas groupby - count unique into separate columns per group

I'm trying to do a groupby where in one column I have string categorical data:

ID   cat_1   cat_2
11   'OG'    'ASD'
11   'LOL'   'ASD' 
11   'OG'    'DFG' 
22   'LOL'   'DFG'
22   'OG'    'DFG'

And I'm trying to group by the ID, and aggregate the string data into a numeric feature, namely, the counts of occurrences for each category. So the outcome would be:

ID  OG  LOL  ASD  DFG
11   2    1    2    1
22   1    1    0    2

How can I achieve this in pandas? Thank you!

You can stack/value_counts/unstack:

(df.set_index('ID')
   .stack()
   .groupby('ID')
   .value_counts()
   .unstack(fill_value=0)
)

NB. you can add .reset_index() if you want all columns

output:

    ASD  DFG  LOL  OG
ID                   
11    2    1    1   2
22    0    2    1   1

You could use pd.get_dummies with groupby and stack :

>>> pd.get_dummies(df.set_index("ID").stack()).groupby("ID").sum()
    ASD  DFG  LOL  OG
ID                   
11    2    1    1   2
22    0    2    1   1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM