aggregate and group three columns in pandas dataframe

Question

My dataframe is

df = pd.DataFrame({'col1': ['A', 'A', 'B', 'B', 'C', 'C', 'A', 'A'],
                   'col2': ['action1', 'action2', 'action1', 'action3', 'action2', 'action1', 'action1', 'action2'],
                   'col3': ['X', 'X', 'X', 'X', 'X', 'X', 'Y', 'Y']})

it looks like

    col1    col2    col3
0   A       action1 X
1   A       action2 X
2   B       action1 X
3   B       action3 X
4   C       action2 X
5   C       action1 X
6   A       action1 Y
7   A       action2 Y

I would like to aggregate them into

    col1    col2            col3
0   A,C     action1,action2 X
1   B       action1,action3 X
2   A       action1,action2 Y

Order of items within the column does not matter. Basically i would like to aggregate col1 and col2. But differentiate the aggregation if col3 is different.

What is the approach I should take?

Answer 1

Probably many ways to do this, but here's a solution that uses groupby twice. Once to build the first set of actions, and next to group on the action and col3.

df = pd.DataFrame({'col1': ['A', 'A', 'B', 'B', 'C', 'C', 'A', 'A'],
                   'col2': ['action1', 'action2', 'action1', 'action3', 'action2', 'action1', 'action1', 'action2'],
                   'col3': ['X', 'X', 'X', 'X', 'X', 'X', 'Y', 'Y']})

df = df.sort_values(by='col2')
df = df.groupby(['col3','col1'], as_index=False)['col2'].apply(lambda x: ','.join(x))
df = df.groupby(['col3','col2'], as_index=False)['col1'].apply(lambda x: ','.join(x)).sort_index(axis=1)

Output

  col1             col2 col3
0  A,C  action1,action2    X
1    B  action1,action3    X
2    A  action1,action2    Y

Answer 2

IIUC, you want to group on groups that have common values in col2.

For this you need to set up a helper group:

m = df.groupby('col1')['col2'].apply(frozenset)

(df.groupby([df['col1'].map(m), 'col3'], as_index=False)
   .aggregate(lambda x: ','.join(set(x)))
)

output:

  col3 col1             col2
0    X  A,C  action1,action2
1    Y    A  action1,action2
2    X    B  action1,action3

aggregate and group three columns in pandas dataframe

Question

2 answers

solution1
-1 2022-01-14 17:42:42

solution2
-1 2022-01-14 17:55:28

aggregate and group three columns in pandas dataframe

Question

2 answers

solution1 -1 2022-01-14 17:42:42

solution2 -1 2022-01-14 17:55:28

solution1
-1 2022-01-14 17:42:42

solution2
-1 2022-01-14 17:55:28