My dataframe is
df = pd.DataFrame({'col1': ['A', 'A', 'B', 'B', 'C', 'C', 'A', 'A'],
'col2': ['action1', 'action2', 'action1', 'action3', 'action2', 'action1', 'action1', 'action2'],
'col3': ['X', 'X', 'X', 'X', 'X', 'X', 'Y', 'Y']})
it looks like
col1 col2 col3
0 A action1 X
1 A action2 X
2 B action1 X
3 B action3 X
4 C action2 X
5 C action1 X
6 A action1 Y
7 A action2 Y
I would like to aggregate them into
col1 col2 col3
0 A,C action1,action2 X
1 B action1,action3 X
2 A action1,action2 Y
Order of items within the column does not matter. Basically i would like to aggregate col1 and col2. But differentiate the aggregation if col3 is different.
What is the approach I should take?
Probably many ways to do this, but here's a solution that uses groupby twice. Once to build the first set of actions, and next to group on the action and col3.
df = pd.DataFrame({'col1': ['A', 'A', 'B', 'B', 'C', 'C', 'A', 'A'],
'col2': ['action1', 'action2', 'action1', 'action3', 'action2', 'action1', 'action1', 'action2'],
'col3': ['X', 'X', 'X', 'X', 'X', 'X', 'Y', 'Y']})
df = df.sort_values(by='col2')
df = df.groupby(['col3','col1'], as_index=False)['col2'].apply(lambda x: ','.join(x))
df = df.groupby(['col3','col2'], as_index=False)['col1'].apply(lambda x: ','.join(x)).sort_index(axis=1)
Output
col1 col2 col3
0 A,C action1,action2 X
1 B action1,action3 X
2 A action1,action2 Y
IIUC, you want to group on groups that have common values in col2.
For this you need to set up a helper group:
m = df.groupby('col1')['col2'].apply(frozenset)
(df.groupby([df['col1'].map(m), 'col3'], as_index=False)
.aggregate(lambda x: ','.join(set(x)))
)
output:
col3 col1 col2
0 X A,C action1,action2
1 Y A action1,action2
2 X B action1,action3
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.