简体   繁体   English

panda dataframe 中的简单聚类

[英]Simple clustering in panda dataframe

I have a dataframe with the following data:我有一个 dataframe 具有以下数据:

df = pd.DataFrame({'col1': ['A', 'A', 'B', 'B', 'C', 'C'],
                   'col2': ['action1', 'action2', 'action1', 'action3', 'action1', 'action2']})

which looks like看起来像

col1, col2
A   , action1
A   , action2
B   , action1
B   , action3
C   , action1
C   , action2

Now since A and C both have action1 and action2.现在因为 A 和 C 都有 action1 和 action2。 Group them together.将它们组合在一起。 B will be a separate group. B 将是一个单独的组。 So i want to generate a data frame below:所以我想在下面生成一个数据框:

col1, col2
A, C, action1, action2
B   , action1, action3

How can I achieve this?我怎样才能做到这一点?

If ordering per groups is same in col2 is possible aggregate join per col1 and then per joined columns:如果在col2中每个组的排序相同,则可以按col1聚合join ,然后按连接列聚合连接:

df = df.groupby('col1')['col2'].agg(', '.join).reset_index()
df = df.groupby('col2')['col1'].agg(', '.join).reset_index()[['col1','col2']]
print (df)
   col1              col2
0  A, C  action1, action2
1     B  action1, action3

Or if ordering should be different use frozenset :或者如果订购应该不同,请使用frozenset

df = df.groupby('col1')['col2'].agg(frozenset).reset_index()
df = (df.groupby('col2')['col1']
        .agg(', '.join)
        .rename(lambda x: ', '.join(x)).reset_index()[['col1','col2']])
print (df)
   col1              col2
0  A, C  action2, action1
1     B  action1, action3


print (df)
  col1     col2
0    A  action1
1    A  action2
2    B  action1
3    B  action3
4    C  action2 <-changed order
5    C  action1 <-changed order

df = df.groupby('col1')['col2'].agg(frozenset).reset_index()
df = (df.groupby('col2')['col1']
        .agg(', '.join)
        .rename(lambda x: ', '.join(x)).reset_index()[['col1','col2']])
print (df)
   col1              col2
0  A, C  action2, action1
1     B  action1, action3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM