pandas groupby() with custom aggregate function to concatenate columns then rows using pandas

Question

Suppose I have a dataframe like:

 Column1    Column2    Column3    Column4
 1          I          am         abc
 3          on         weekend    holidays
 1          I          do         business
 2          I          am         xyz
 3          I          do         nothing
 2          I          do         job

after applying the groupby() using pandas expected result is:

Column1    Column2
1          I am abc I do business
2          I am Xyz I do job
3          On weekend holidays I do nothing

The required aggregation is applicable first for the column than by rows.

How it can be performed?

Answer 1

Have you tried:

df['newcol'] = df.apply(lambda x: " ".join(x[1:]), axis=1)
df.groupby('Column1').agg({'newcol': lambda x: " ".join()})

Answer 2

Use DataFrame.set_index with DataFrame.stack first and then aggregate join in GroupBy.agg :

df1 = (df.set_index('Column1')
         .stack()
         .groupby("Column1")
         .agg(' '.join)
         .reset_index(name='Column2'))
print (df1)
   Column1                           Column2
0        1            I am abc I do business
1        2                 I am xyz I do job
2        3  on weekend holidays I do nothing

Answer 3

Can you try this? First combine the words of the columns you want into new column then use groupby to join them together.

df['new_col'] = df['Column2'] + str(" ") + df['Column3'] + str(" ") + df['Column4']

df.groupby('Column1')['new_col'].agg(lambda x: ' '.join(x.astype(str)))

Column1
1              I am abc I do business
2                   I am xyz I do job
3    on weekend holidays I do nothing
Name: new_col, dtype: object

Answer 4

Can you try as follows

def apply_union(x):
    ## join multiple columns to single sting in row
    x = x.apply(lambda row: ' '.join(row.values.astype(str)), axis=1)
    ## concat rows to single string
    x = x.str.cat(sep=" ")
    return x
df.groupby("Column1")[["Column2","Column3","Column4"]].apply(lambda x: apply_union(x))

Answer 5

You could take advantage of the fact that the last three columns are string type and combine them, using the sum function, and groupby on column1, this time aggregating with python's string join function:

outcome = (df
           .set_index("Column1")
           #this helps to put space between
           #the columns when summed
           .add(' ')
           #this combines the columns into one
           .sum(axis=1)
           .str.rstrip(" ")
           .groupby("Column1")
           .agg(" ".join)
           .reset_index(name='Column2')
          )

outcome

    Column1      Column2
0   1           I am abc I do business
1   2           I am xyz I do job
2   3           on weekend holidays I do nothing

pandas groupby() with custom aggregate function to concatenate columns then rows using pandas

Question

5 answers

solution1
2 ACCPTED 2020-06-16 04:14:26

solution2
2 2020-06-16 05:06:54

solution3
1 2020-06-16 04:11:55

solution4
1 2020-06-16 04:15:07

solution5
1 2020-06-16 04:30:30

pandas groupby() with custom aggregate function to concatenate columns then rows using pandas

Question

5 answers

solution1 2 ACCPTED 2020-06-16 04:14:26

solution2 2 2020-06-16 05:06:54

solution3 1 2020-06-16 04:11:55

solution4 1 2020-06-16 04:15:07

solution5 1 2020-06-16 04:30:30

solution1
2 ACCPTED 2020-06-16 04:14:26

solution2
2 2020-06-16 05:06:54

solution3
1 2020-06-16 04:11:55

solution4
1 2020-06-16 04:15:07

solution5
1 2020-06-16 04:30:30