I have this data into a dataframe
id data1 string1 string2
0 0 A 'house' 'garden'
1 1 B 'appart' 'wc'
2 1 B 'flat' 'kitchen'
3 2 C 'castle' 'cave'
I am trying to group it on the column ['id', 'data1'] and create a new column with the result aggregated in my way.
id data1 string1 string2 concat_data
0 0 A 'house' 'garden' 'string1: house, string2: garden'
1 1 B 'appart' 'wc' 'string1: appart, string2: wc, string1: flat, string2: kitchen'
3 2 C 'castle' 'cave' 'string1: castle, string2: cave'
I have tried a lot of solutions with groupby and aggregate and apply but none of it works.
This would work:
new_df = df.groupby(["id", "data1"]).apply(
lambda group: ", ".join([str(dct).strip("{}") for dct in group[["string1", "string2"]].to_dict("records")])
).rename("concat_data").reset_index()
If you want to keep the other columns as well, you should create a function to pass to apply
:
def concat_strings(group):
concat_data = ", ".join([str(dct).strip("{}") for dct in group[["string1", "string2"]].to_dict("records")])
return group[["string1", "string2"]].loc[0].append(pd.Series({"concat_data": concat_data}))
new_df = df.groupby(["id", "data1"]).apply(concat_strings).reset_index()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.