My dataframe has the following data
callerid seq text
1236 2 I need to talk to x
1236 6 Issue 3 is this
1236 3 This is regarding abc
1236 5 Issue 2 is this
1236 4 Issue 1 is this
1236 1 Hi
1347 2 I need to talk to x
1347 6 Issue 3 is this
1347 3 This is regarding abc
1347 5 Issue 2 is this
1347 4 Issue 1 is this
1347 1 Hi
I need to group the data by callerid,sort by the seq, concat text and write to another dataframe
The final output data should look like this
callerid text
1236 Hi I need to talk to X This is regarding abc Issue 1 is this Issue 2 is this Issue 3 is this
1347 Hi I need to talk to X This is regarding abc Issue 1 is this Issue 2 is this Issue 3 is this
I tried the following code
documentext = dataextract.sort_values(['callerid','seq']).groupby('callerid')
documenttext1 = documenttext[['callerid','text']]
documentext1 = (documenttext1.groupby('callerid')['text']
.apply(lambda x: ' '.join(set(x.dropna())))
.reset_index())
The first statement is not giving me the complete sorted text This is the output I get
callerid seq text
1236 1 Hi
1236 2 I need to talk to x
1236 3 This is regarding abc
1347 1 Hi
1347 2 I need to talk to x
1347 3 This is regarding abc
Appreciate any help on this
Thanks in advance
As you guessed, the first step is to sort, the second is to group. You can use ' '.join
as the aggfunc to concatenate your strings.
(df.sort_values('seq')
.groupby('callerid', sort=False)['text']
.agg(' '.join)
.reset_index())
callerid text
0 1236 Hi I need to talk to x This is regarding abc I...
1 1347 Hi I need to talk to x This is regarding abc I...
You shouldn't group over "seq" since you're trying to aggregate across it.
More like the index sum
(' '+df.set_index(['callerid','seq']).\
sort_index([0,1]).text).\
sum(level=0,axis=0).str.strip().reset_index()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.