Concatenate strings with pandas GroupBy based on ordering from another column

Question

My dataframe has the following data

callerid  seq   text
1236     2      I need to talk to x
1236     6      Issue 3 is this
1236     3      This is regarding abc
1236     5      Issue 2 is this
1236     4      Issue 1 is this
1236     1      Hi
1347     2      I need to talk to x
1347     6      Issue 3 is this
1347     3      This is regarding abc
1347     5      Issue 2 is this
1347     4      Issue 1 is this
1347     1      Hi

I need to group the data by callerid,sort by the seq, concat text and write to another dataframe

The final output data should look like this

callerid        text    
1236            Hi I need to talk to X This is regarding abc Issue 1 is this Issue 2 is this Issue 3 is this    
1347            Hi I need to talk to X This is regarding abc Issue 1 is this Issue 2 is this Issue 3 is this

I tried the following code

documentext = dataextract.sort_values(['callerid','seq']).groupby('callerid')

documenttext1 = documenttext[['callerid','text']]
documentext1 = (documenttext1.groupby('callerid')['text']
       .apply(lambda x: ' '.join(set(x.dropna())))
       .reset_index())

The first statement is not giving me the complete sorted text This is the output I get

callerid seq   text
1236     1     Hi
1236     2     I need to talk to x
1236     3     This is regarding abc
1347     1     Hi
1347     2     I need to talk to x
1347     3     This is regarding abc

Appreciate any help on this

Thanks in advance

Answer 1

As you guessed, the first step is to sort, the second is to group. You can use ' '.join as the aggfunc to concatenate your strings.

(df.sort_values('seq')
   .groupby('callerid', sort=False)['text']
   .agg(' '.join)
   .reset_index())

   callerid                                               text
0      1236  Hi I need to talk to x This is regarding abc I...
1      1347  Hi I need to talk to x This is regarding abc I...

You shouldn't group over "seq" since you're trying to aggregate across it.

Answer 2

More like the index sum

(' '+df.set_index(['callerid','seq']).\
   sort_index([0,1]).text).\
      sum(level=0,axis=0).str.strip().reset_index()

Concatenate strings with pandas GroupBy based on ordering from another column

Question

2 answers

solution1
2 2019-06-23 16:43:34

solution2
1 2019-06-23 16:56:47

Concatenate strings with pandas GroupBy based on ordering from another column

Question

2 answers

solution1 2 2019-06-23 16:43:34

solution2 1 2019-06-23 16:56:47

solution1
2 2019-06-23 16:43:34

solution2
1 2019-06-23 16:56:47