简体   繁体   中英

Pandas group by to aggregate string field

My df is this:

1   2   3
A  abc  ab
A  abc  cc
A  abc  ab

I'd like to group by the record to have

1   2   3
A  abc  ab
A  abc  cc

or even better, have one field with concatenated string:

   1  
A_abc_ab
A_abc_cc

Pandas GroupBy doesn't seem to work with string:

df = df.groupby(['1','2','3'])

return

<pandas.core.groupby.DataFrameGroupBy object at 0x7f4a37549bd0>

You are not applying groupby correctly. Also after groupby you have to group.aggregate() in order to reduce cells on the basis of some function

Probably you may want this better:

df.apply('-'.join, axis=1)

which produces

0    A-abc-ab
1    A-abc-cc
2    A-abc-ab
dtype: object

Of course you can drop_duplicates before of after joining

Moving from this:

1   2   3
A  abc  ab
A  abc  cc
A  abc  ab

To this:

1   2   3
A  abc  ab
A  abc  cc

Doesn't involve grouping at all! you're just dropping duplicates:

In [9]: df.drop_duplicates()
Out[9]: 
   1    2   3
0  A  abc  ab
1  A  abc  cc

You can then use apply to concatenate:

In [10]: df.drop_duplicates().apply('_'.join, axis=1)
Out[10]: 
0    A_abc_ab
1    A_abc_cc
dtype: object

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM