简体   繁体   中英

Using pandas to concatenate values based on values in other columns

I'm very new to using Python and I've been googling around, but nothing seems to exactly fit my problem.

I have a dataset like the following:

  groupID   sentenceID           strings
   A        0                    'abc'
   A        0                    'def'
   A        1                    'ghi'
   B        0                    'abc'
   B        1                    'def'
   B        2                    'ghi'

and I'd like the output to look like:

  groupID   sentenceID           strings
   A        0                    'abc. def'
   A        1                    'ghi'
   B        0                    'abc'
   B        1                    'def'
   B        2                    'ghi'

Written out in plain English, what I'm trying to accomplish is as follows:

For unique group in groupID:
if sentenceID is duplicate, then concatenate strings
if sentenceID is not duplicate, then print string

I'm sure that it's easy to do with pandas, but I'm having trouble getting it right. Can any one help? Thank you in advance.

You can use groupby and a custom aggregating function to do it

df.groupby(['groupID','sentenceID']).\
   aggregate({'strings': (lambda x: '. '.join(x))}).\
   reset_index()

Another way with groupby() and apply()

df.groupby(['groupID','sentenceID'])['strings'].apply(lambda x: ', '.join(x)).reset_index()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM