panda dataframe 中的简单聚类

Question

I have a dataframe with the following data:我有一个 dataframe 具有以下数据：

df = pd.DataFrame({'col1': ['A', 'A', 'B', 'B', 'C', 'C'],
                   'col2': ['action1', 'action2', 'action1', 'action3', 'action1', 'action2']})

which looks like看起来像

col1, col2
A   , action1
A   , action2
B   , action1
B   , action3
C   , action1
C   , action2

Now since A and C both have action1 and action2.现在因为 A 和 C 都有 action1 和 action2。 Group them together.将它们组合在一起。 B will be a separate group. B 将是一个单独的组。 So i want to generate a data frame below:所以我想在下面生成一个数据框：

col1, col2
A, C, action1, action2
B   , action1, action3

How can I achieve this?我怎样才能做到这一点？

Answer 1

If ordering per groups is same in col2 is possible aggregate join per col1 and then per joined columns:如果在col2中每个组的排序相同，则可以按col1聚合join ，然后按连接列聚合连接：

df = df.groupby('col1')['col2'].agg(', '.join).reset_index()
df = df.groupby('col2')['col1'].agg(', '.join).reset_index()[['col1','col2']]
print (df)
   col1              col2
0  A, C  action1, action2
1     B  action1, action3

Or if ordering should be different use frozenset :或者如果订购应该不同，请使用frozenset ：

df = df.groupby('col1')['col2'].agg(frozenset).reset_index()
df = (df.groupby('col2')['col1']
        .agg(', '.join)
        .rename(lambda x: ', '.join(x)).reset_index()[['col1','col2']])
print (df)
   col1              col2
0  A, C  action2, action1
1     B  action1, action3


print (df)
  col1     col2
0    A  action1
1    A  action2
2    B  action1
3    B  action3
4    C  action2 <-changed order
5    C  action1 <-changed order

df = df.groupby('col1')['col2'].agg(frozenset).reset_index()
df = (df.groupby('col2')['col1']
        .agg(', '.join)
        .rename(lambda x: ', '.join(x)).reset_index()[['col1','col2']])
print (df)
   col1              col2
0  A, C  action2, action1
1     B  action1, action3

panda dataframe 中的简单聚类

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-01-14 11:47:35

panda dataframe 中的简单聚类

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-01-14 11:47:35

解决方案1
1 已采纳 2022-01-14 11:47:35