Pandas groupby multiple string columns into one columns

Question

I have this data into a dataframe

    id      data1        string1        string2
0    0          A        'house'       'garden'
1    1          B       'appart'           'wc'  
2    1          B         'flat'      'kitchen'  
3    2          C       'castle'         'cave'

I am trying to group it on the column ['id', 'data1'] and create a new column with the result aggregated in my way.

    id   data1         string1        string2                                         concat_data
0    0       A         'house'       'garden'                                  'string1: house, string2: garden'
1    1       B        'appart'           'wc'    'string1: appart, string2: wc, string1: flat, string2: kitchen'
3    2       C        'castle'         'cave'                                   'string1: castle, string2: cave'

I have tried a lot of solutions with groupby and aggregate and apply but none of it works.

Answer 1

This would work:

new_df = df.groupby(["id", "data1"]).apply(
    lambda group: ", ".join([str(dct).strip("{}") for dct in group[["string1", "string2"]].to_dict("records")])
).rename("concat_data").reset_index()

If you want to keep the other columns as well, you should create a function to pass to apply :

def concat_strings(group):
    concat_data = ", ".join([str(dct).strip("{}") for dct in group[["string1", "string2"]].to_dict("records")])
    return group[["string1", "string2"]].loc[0].append(pd.Series({"concat_data": concat_data}))

new_df = df.groupby(["id", "data1"]).apply(concat_strings).reset_index()

Pandas groupby multiple string columns into one columns

Question

1 answers

solution1
0 2020-04-07 22:00:49

Pandas groupby multiple string columns into one columns

Question

1 answers

solution1 0 2020-04-07 22:00:49

solution1
0 2020-04-07 22:00:49