简体   繁体   English

pandas groupby在多列中连接字符串

[英]pandas groupby concatenate strings in multiple columns

I have this pandas data frame: 我有这个pandas数据框:

df = DataFrame({'id':['a','b','b','b','c','c'], 'category':['z','z','x','y','y','y'], 'category2':['1','2','2','2','1','2']})

which looks like: 看起来像:

  category category2 id
0        z         1  a
1        z         2  b
2        x         2  b
3        y         2  b
4        y         1  c
5        y         2  c

What i'd like to do is to groupby id and return the other two columns as a concatenation of unique strings. 我想做的是groupby id并返回另外两列作为唯一字符串的串联。

The outcome would look like: 结果如下:

  category category2 id
0        z         1  a
1      zxy         2  b
2        y        12  c

Use groupby/agg to aggregate the groups. 使用groupby/agg聚合组。 For each group, apply set to find the unique strings, and ''.join to concatenate the strings: 对于每个组,使用set来查找唯一的字符串,并使用''.join来连接字符串:

In [34]: df.groupby('id').agg(lambda x: ''.join(set(x)))
Out[34]: 
   category category2
id                   
a         z         1
b       yxz         2
c         y        12

To move id from the index to a column of the resultant DataFrame, call reset_index : 要将id从索引移动到结果DataFrame的列,请调用reset_index

In [59]: df.groupby('id').agg(lambda x: ''.join(set(x))).reset_index()
Out[59]: 
  id category category2
0  a        z         1
1  b      yxz         2
2  c        y        12

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM