[英]Pandas groupby - count unique into separate columns per group
I'm trying to do a groupby where in one column I have string categorical data:我正在尝试进行分组,其中一列中有字符串分类数据:
ID cat_1 cat_2
11 'OG' 'ASD'
11 'LOL' 'ASD'
11 'OG' 'DFG'
22 'LOL' 'DFG'
22 'OG' 'DFG'
And I'm trying to group by the ID, and aggregate the string data into a numeric feature, namely, the counts of occurrences for each category.我试图按 ID 分组,并将字符串数据聚合成一个数字特征,即每个类别的出现次数。 So the outcome would be:所以结果将是:
ID OG LOL ASD DFG
11 2 1 2 1
22 1 1 0 2
How can I achieve this in pandas?我怎样才能在 pandas 中实现这一点? Thank you!谢谢你!
You can stack/value_counts/unstack:你可以堆叠/value_counts/unstack:
(df.set_index('ID')
.stack()
.groupby('ID')
.value_counts()
.unstack(fill_value=0)
)
NB.注意。 you can add .reset_index()
if you want all columns如果你想要所有列,你可以添加.reset_index()
output: output:
ASD DFG LOL OG
ID
11 2 1 1 2
22 0 2 1 1
You could use pd.get_dummies
with groupby
and stack
:您可以将pd.get_dummies
与groupby
和stack
一起使用:
>>> pd.get_dummies(df.set_index("ID").stack()).groupby("ID").sum()
ASD DFG LOL OG
ID
11 2 1 1 2
22 0 2 1 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.