Using pandas to concatenate values based on values in other columns

Question

I'm very new to using Python and I've been googling around, but nothing seems to exactly fit my problem.

I have a dataset like the following:

  groupID   sentenceID           strings
   A        0                    'abc'
   A        0                    'def'
   A        1                    'ghi'
   B        0                    'abc'
   B        1                    'def'
   B        2                    'ghi'

and I'd like the output to look like:

  groupID   sentenceID           strings
   A        0                    'abc. def'
   A        1                    'ghi'
   B        0                    'abc'
   B        1                    'def'
   B        2                    'ghi'

Written out in plain English, what I'm trying to accomplish is as follows:

For unique group in groupID:
if sentenceID is duplicate, then concatenate strings
if sentenceID is not duplicate, then print string

I'm sure that it's easy to do with pandas, but I'm having trouble getting it right. Can any one help? Thank you in advance.

Answer 1

You can use groupby and a custom aggregating function to do it

df.groupby(['groupID','sentenceID']).\
   aggregate({'strings': (lambda x: '. '.join(x))}).\
   reset_index()

Answer 2

Another way with groupby() and apply()

df.groupby(['groupID','sentenceID'])['strings'].apply(lambda x: ', '.join(x)).reset_index()

Using pandas to concatenate values based on values in other columns

Question

2 answers

solution1
1 2019-10-31 01:08:33

solution2
0 2019-10-31 01:52:49

Using pandas to concatenate values based on values in other columns

Question

2 answers

solution1 1 2019-10-31 01:08:33

solution2 0 2019-10-31 01:52:49

solution1
1 2019-10-31 01:08:33

solution2
0 2019-10-31 01:52:49