[英]pandas groupby concatenate strings in multiple columns
I have this pandas data frame: 我有这个pandas数据框:
df = DataFrame({'id':['a','b','b','b','c','c'], 'category':['z','z','x','y','y','y'], 'category2':['1','2','2','2','1','2']})
which looks like: 看起来像:
category category2 id
0 z 1 a
1 z 2 b
2 x 2 b
3 y 2 b
4 y 1 c
5 y 2 c
What i'd like to do is to groupby id and return the other two columns as a concatenation of unique strings. 我想做的是groupby id并返回另外两列作为唯一字符串的串联。
The outcome would look like: 结果如下:
category category2 id
0 z 1 a
1 zxy 2 b
2 y 12 c
Use groupby/agg
to aggregate the groups. 使用
groupby/agg
聚合组。 For each group, apply set
to find the unique strings, and ''.join
to concatenate the strings: 对于每个组,使用
set
来查找唯一的字符串,并使用''.join
来连接字符串:
In [34]: df.groupby('id').agg(lambda x: ''.join(set(x)))
Out[34]:
category category2
id
a z 1
b yxz 2
c y 12
To move id
from the index to a column of the resultant DataFrame, call reset_index
: 要将
id
从索引移动到结果DataFrame的列,请调用reset_index
:
In [59]: df.groupby('id').agg(lambda x: ''.join(set(x))).reset_index()
Out[59]:
id category category2
0 a z 1
1 b yxz 2
2 c y 12
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.