简体   繁体   中英

pandas groupby() with custom aggregate function to concatenate columns then rows using pandas

Suppose I have a dataframe like:

 Column1    Column2    Column3    Column4
 1          I          am         abc
 3          on         weekend    holidays
 1          I          do         business
 2          I          am         xyz
 3          I          do         nothing
 2          I          do         job

after applying the groupby() using pandas expected result is:

Column1    Column2
1          I am abc I do business
2          I am Xyz I do job
3          On weekend holidays I do nothing

The required aggregation is applicable first for the column than by rows.

How it can be performed?

Have you tried:

df['newcol'] = df.apply(lambda x: " ".join(x[1:]), axis=1)
df.groupby('Column1').agg({'newcol': lambda x: " ".join()})

Use DataFrame.set_index with DataFrame.stack first and then aggregate join in GroupBy.agg :

df1 = (df.set_index('Column1')
         .stack()
         .groupby("Column1")
         .agg(' '.join)
         .reset_index(name='Column2'))
print (df1)
   Column1                           Column2
0        1            I am abc I do business
1        2                 I am xyz I do job
2        3  on weekend holidays I do nothing

Can you try this? First combine the words of the columns you want into new column then use groupby to join them together.

df['new_col'] = df['Column2'] + str(" ") + df['Column3'] + str(" ") + df['Column4']

df.groupby('Column1')['new_col'].agg(lambda x: ' '.join(x.astype(str)))

Column1
1              I am abc I do business
2                   I am xyz I do job
3    on weekend holidays I do nothing
Name: new_col, dtype: object

Can you try as follows

def apply_union(x):
    ## join multiple columns to single sting in row
    x = x.apply(lambda row: ' '.join(row.values.astype(str)), axis=1)
    ## concat rows to single string
    x = x.str.cat(sep=" ")
    return x
df.groupby("Column1")[["Column2","Column3","Column4"]].apply(lambda x: apply_union(x))

You could take advantage of the fact that the last three columns are string type and combine them, using the sum function, and groupby on column1, this time aggregating with python's string join function:

outcome = (df
           .set_index("Column1")
           #this helps to put space between
           #the columns when summed
           .add(' ')
           #this combines the columns into one
           .sum(axis=1)
           .str.rstrip(" ")
           .groupby("Column1")
           .agg(" ".join)
           .reset_index(name='Column2')
          )

outcome

    Column1      Column2
0   1           I am abc I do business
1   2           I am xyz I do job
2   3           on weekend holidays I do nothing

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM