简体   繁体   中英

How to concatenate partially sequential occurring rows in data frame using pandas

I have a csv as follows. which is broken into multiple rows.

like as follows

Names,text,conv_id
tim,hi,1234
jon,hello,1234
jon,how,1234
jon,are you,1234
tim,hey,1234
tim,i am good,1234
pam, me too,1234
jon,great,1234
jon,hows life,1234

So i want to concatenate the sequentially occuring elements into one row as follows and make it more meaningful

Expected output:

Names,text,conv_id
tim,hi,1234
jon,hello how are you,1234
tim,hey i am good,1234
pam, me too,1234
jon,great hows life,1234

I tried a couple of things but I failed and couldn't do can anyone please guide me how to do this?

Thanks in advance.

You can use Series.shift + Series.cumsum to be able to create the appropriate groups through groupby and then use join applied to each group using groupby.apply . 'conv_id' , an 'Names' are added to the groups so that they can be retrieved using Series.reset_index . Finally, DataFrame.reindex is used to place the columns in the initial order

groups=df['Names'].rename('groups').ne(df['Names'].shift()).cumsum()
new_df=( df.groupby([groups,'conv_id','Names'])['text']
        .apply(lambda x: ','.join(x))
        .reset_index(level=['Names','conv_id'])
        .reindex(columns=df.columns) )

print(new_df)

  Names               text  conv_id
1   tim                 hi     1234
2   jon  hello,how,are you     1234
3   tim      hey,i am good     1234
4   pam             me too     1234
5   jon    great,hows life     1234

Detail:

print(groups)

0    1
1    2
2    2
3    2
4    3
5    3
6    4
7    5
8    5
dtype: int64

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM