Pandas groupby - 将唯一值计入每组单独的列

Question

I'm trying to do a groupby where in one column I have string categorical data:我正在尝试进行分组，其中一列中有字符串分类数据：

ID   cat_1   cat_2
11   'OG'    'ASD'
11   'LOL'   'ASD' 
11   'OG'    'DFG' 
22   'LOL'   'DFG'
22   'OG'    'DFG'

And I'm trying to group by the ID, and aggregate the string data into a numeric feature, namely, the counts of occurrences for each category.我试图按 ID 分组，并将字符串数据聚合成一个数字特征，即每个类别的出现次数。 So the outcome would be:所以结果将是：

ID  OG  LOL  ASD  DFG
11   2    1    2    1
22   1    1    0    2

How can I achieve this in pandas?我怎样才能在 pandas 中实现这一点？ Thank you!谢谢你！

Answer 1

You can stack/value_counts/unstack:你可以堆叠/value_counts/unstack：

(df.set_index('ID')
   .stack()
   .groupby('ID')
   .value_counts()
   .unstack(fill_value=0)
)

NB.注意。 you can add .reset_index() if you want all columns如果你想要所有列，你可以添加.reset_index()

output: output：

    ASD  DFG  LOL  OG
ID                   
11    2    1    1   2
22    0    2    1   1

Answer 2

You could use pd.get_dummies with groupby and stack :您可以将pd.get_dummies与groupby和stack一起使用：

>>> pd.get_dummies(df.set_index("ID").stack()).groupby("ID").sum()
    ASD  DFG  LOL  OG
ID                   
11    2    1    1   2
22    0    2    1   1

Pandas groupby - 将唯一值计入每组单独的列

问题描述

2 个解决方案

解决方案1
2 2021-08-17 19:46:31

解决方案2
1 已采纳 2021-08-17 20:06:47

Pandas groupby - 将唯一值计入每组单独的列

问题描述

2 个解决方案

解决方案1 2 2021-08-17 19:46:31

解决方案2 1 已采纳 2021-08-17 20:06:47

解决方案1
2 2021-08-17 19:46:31

解决方案2
1 已采纳 2021-08-17 20:06:47