简体   繁体   中英

Pandas groupby multiple string columns into one columns

I have this data into a dataframe

    id      data1        string1        string2
0    0          A        'house'       'garden'
1    1          B       'appart'           'wc'  
2    1          B         'flat'      'kitchen'  
3    2          C       'castle'         'cave'

I am trying to group it on the column ['id', 'data1'] and create a new column with the result aggregated in my way.

    id   data1         string1        string2                                         concat_data
0    0       A         'house'       'garden'                                  'string1: house, string2: garden'
1    1       B        'appart'           'wc'    'string1: appart, string2: wc, string1: flat, string2: kitchen'
3    2       C        'castle'         'cave'                                   'string1: castle, string2: cave'

I have tried a lot of solutions with groupby and aggregate and apply but none of it works.

This would work:

new_df = df.groupby(["id", "data1"]).apply(
    lambda group: ", ".join([str(dct).strip("{}") for dct in group[["string1", "string2"]].to_dict("records")])
).rename("concat_data").reset_index()

If you want to keep the other columns as well, you should create a function to pass to apply :

def concat_strings(group):
    concat_data = ", ".join([str(dct).strip("{}") for dct in group[["string1", "string2"]].to_dict("records")])
    return group[["string1", "string2"]].loc[0].append(pd.Series({"concat_data": concat_data}))

new_df = df.groupby(["id", "data1"]).apply(concat_strings).reset_index()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM