简体   繁体   中英

python - How to concatenate consecutive rows without losing the order of the records in pandas?

I am trying to concatenate two rows based on a categorical variable when the categories of the variable in the consecutive rows are same. Here is my data below for example:

   SNo  user    Text
0   1   Sam     Hello
1   1   John    Hi
2   1   Sam     How are you?
3   1   John    I am good
4   1   John    How about you?
5   1   John    How is it going?
6   1   Sam     Its going good
7   1   Sam     Thanks
8   2   Mary    Howdy?
9   2   Jake    Hey!!
10  2   Jake    What a surprise
11  2   Mary    Good to see you here :)
12  2   Jake    Ha ha. Hectic life
13  2   Mary    I know right..
14  2   Mary    How's Amy doing?
15  2   Mary    How are the kids?
16  2   Jake    All is good! :)

Here, if my previous value of user column is same as my current value of user column but different from the next value in that column, then, I'd concatenate the values from the column Text for that user. I need to do this until there are no more multiple occurrence of that particular user. A sample output is given below:

SNo user    Text
1   Sam     Hello
1   John    Hi
1   Sam     How are you?
1   John    I am good-How about you?-How is it going?
1   Sam     Its going good-Thanks
2   Mary    Howdy?
2   Jake    Hey!!-What a surprise
2   Mary    Good to see you here :)
2   Jake    Ha ha. Hectic life
2   Mary    I know right..-How's Amy doing?-How are the kids?
2   Jake    All is good! :)

I tried using df.groupby() and then .agg() to get the concatenation done but unable to apply the above mentioned condition over it. Hence the output is combining all occurrences of an user for a chat.

df = sample_data.groupby(["SNo","user"]).agg({'Text': '-'.join}).reset_index() # incorrect though
df

Moreover, I am trying to avoid for loop like a plague and trying out a vectorised solution.


Sample data:

data_dict = {'S. No.': [1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2], 'user': ['Sam', 'John', 'Sam', 'John', 'John', 'John', 'Sam', 'Sam', 'Mary', 'Jake', 'Jake', 'Mary', 'Jake ', 'Mary', 'Mary', 'Mary', 'Jake'], 'Text': ['Hello', 'Hi', 'How are you?', 'I am good', 'How about you?', 'How is it going?', 'Its going good', 'Thanks', 'Howdy?', 'Hey!!', 'What a surprise', 'Good to see you here :)', 'Ha ha. Hectic life', 'I know right..', "How's Amy doing?", 'How are the kids?', 'All is good! :)']}

sample_data = pd.DataFrame(data_dict)

You want to compare user with its shift and cumsum for changes. Then you can groupby:

blocks = df['user'].ne(df['user'].shift()).cumsum()
(df.groupby(['SNo', blocks])
  .agg({'user':'first','Text': '-'.join})
  .reset_index('user', drop=True)
)

Output:

     user                                               Text
SNo                                                         
1     Sam                                              Hello
1    John                                                 Hi
1     Sam                                       How are you?
1    John          I am good-How about you?-How is it going?
1     Sam                              Its going good-Thanks
2    Mary                                             Howdy?
2    Jake                              Hey!!-What a surprise
2    Mary                            Good to see you here :)
2    Jake                                 Ha ha. Hectic life
2    Mary  I know right..-How's Amy doing?-How are the kids?
2    Jake                                    All is good! :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM