[英]Simple clustering in panda dataframe
I have a dataframe with the following data:我有一个 dataframe 具有以下数据:
df = pd.DataFrame({'col1': ['A', 'A', 'B', 'B', 'C', 'C'],
'col2': ['action1', 'action2', 'action1', 'action3', 'action1', 'action2']})
which looks like看起来像
col1, col2
A , action1
A , action2
B , action1
B , action3
C , action1
C , action2
Now since A and C both have action1 and action2.现在因为 A 和 C 都有 action1 和 action2。 Group them together.将它们组合在一起。 B will be a separate group. B 将是一个单独的组。 So i want to generate a data frame below:所以我想在下面生成一个数据框:
col1, col2
A, C, action1, action2
B , action1, action3
How can I achieve this?我怎样才能做到这一点?
If ordering per groups is same in col2
is possible aggregate join
per col1
and then per joined columns:如果在col2
中每个组的排序相同,则可以按col1
聚合join
,然后按连接列聚合连接:
df = df.groupby('col1')['col2'].agg(', '.join).reset_index()
df = df.groupby('col2')['col1'].agg(', '.join).reset_index()[['col1','col2']]
print (df)
col1 col2
0 A, C action1, action2
1 B action1, action3
Or if ordering should be different use frozenset
:或者如果订购应该不同,请使用frozenset
:
df = df.groupby('col1')['col2'].agg(frozenset).reset_index()
df = (df.groupby('col2')['col1']
.agg(', '.join)
.rename(lambda x: ', '.join(x)).reset_index()[['col1','col2']])
print (df)
col1 col2
0 A, C action2, action1
1 B action1, action3
print (df)
col1 col2
0 A action1
1 A action2
2 B action1
3 B action3
4 C action2 <-changed order
5 C action1 <-changed order
df = df.groupby('col1')['col2'].agg(frozenset).reset_index()
df = (df.groupby('col2')['col1']
.agg(', '.join)
.rename(lambda x: ', '.join(x)).reset_index()[['col1','col2']])
print (df)
col1 col2
0 A, C action2, action1
1 B action1, action3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.