[英]Pandas create groups from column values
I have a dataframe df as follows:我有一个数据框 df 如下:
Col1 Col2
A1 A1
B1 A1
B1 B1
C1 C1
D1 A1
D1 B1
D1 D1
E1 A1
I am trying to achieve the following:我正在努力实现以下目标:
Col1 Group
A1 A1
B1 A1
D1 A1
E1 A1
C1 C1
ie in df
every value which have relationship gets grouped together as a single value.即在df
中,每个有关系的值都被组合在一起作为一个值。 ie in the example above (A1, A1), (B1, A1), (B1, B1), (D1, A1), (D1, B1), (D1, D1), (E1, A1) can either directly or indirectly be all linked to A1 (the first in alphabet sort) so they all get assigned the group id A1 and so on.即在上面的例子中 (A1, A1), (B1, A1), (B1, B1), (D1, A1), (D1, B1), (D1, D1), (E1, A1) 可以直接或间接地全部链接到 A1(字母排序中的第一个),因此它们都被分配了组 ID A1 等等。
I am not sure how to do this.我不知道该怎么做。
This can be approached using a graph.这可以使用图表来解决。
Here is your graph:这是你的图表:
You can use networkx
to find the connected_components
:您可以使用networkx
查找connected_components
:
import networkx as nx
G = nx.from_pandas_edgelist(df, source='Col1', target='Col2')
d = {}
for g in nx.connected_components(G):
g = sorted(g)
for x in g:
d[x] = g[0]
out = pd.Series(d)
output:输出:
A1 A1
B1 A1
D1 A1
E1 A1
C1 C1
dtype: object
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.