Pandas groupby - count unique into separate columns per group

Question

I'm trying to do a groupby where in one column I have string categorical data:

ID   cat_1   cat_2
11   'OG'    'ASD'
11   'LOL'   'ASD' 
11   'OG'    'DFG' 
22   'LOL'   'DFG'
22   'OG'    'DFG'

And I'm trying to group by the ID, and aggregate the string data into a numeric feature, namely, the counts of occurrences for each category. So the outcome would be:

ID  OG  LOL  ASD  DFG
11   2    1    2    1
22   1    1    0    2

How can I achieve this in pandas? Thank you!

Answer 1

You can stack/value_counts/unstack:

(df.set_index('ID')
   .stack()
   .groupby('ID')
   .value_counts()
   .unstack(fill_value=0)
)

NB. you can add .reset_index() if you want all columns

output:

    ASD  DFG  LOL  OG
ID                   
11    2    1    1   2
22    0    2    1   1

Answer 2

You could use pd.get_dummies with groupby and stack :

>>> pd.get_dummies(df.set_index("ID").stack()).groupby("ID").sum()
    ASD  DFG  LOL  OG
ID                   
11    2    1    1   2
22    0    2    1   1

Pandas groupby - count unique into separate columns per group

Question

2 answers

solution1
2 2021-08-17 19:46:31

solution2
1 ACCPTED 2021-08-17 20:06:47

Pandas groupby - count unique into separate columns per group

Question

2 answers

solution1 2 2021-08-17 19:46:31

solution2 1 ACCPTED 2021-08-17 20:06:47

solution1
2 2021-08-17 19:46:31

solution2
1 ACCPTED 2021-08-17 20:06:47