简体   繁体   English

Pandas 从列值创建组

[英]Pandas create groups from column values

I have a dataframe df as follows:我有一个数据框 df 如下:

Col1    Col2
A1      A1
B1      A1
B1      B1
C1      C1
D1      A1
D1      B1
D1      D1
E1      A1

I am trying to achieve the following:我正在努力实现以下目标:

Col1    Group
A1      A1
B1      A1
D1      A1
E1      A1
C1      C1

ie in df every value which have relationship gets grouped together as a single value.即在df中,每个有关系的值都被组合在一起作为一个值。 ie in the example above (A1, A1), (B1, A1), (B1, B1), (D1, A1), (D1, B1), (D1, D1), (E1, A1) can either directly or indirectly be all linked to A1 (the first in alphabet sort) so they all get assigned the group id A1 and so on.即在上面的例子中 (A1, A1), (B1, A1), (B1, B1), (D1, A1), (D1, B1), (D1, D1), (E1, A1) 可以直接或间接地全部链接到 A1(字母排序中的第一个),因此它们都被分配了组 ID A1 等等。

I am not sure how to do this.我不知道该怎么做。

This can be approached using a graph.这可以使用图表来解决。

Here is your graph:这是你的图表:

图形

You can use networkx to find the connected_components :您可以使用networkx查找connected_components

import networkx as nx

G = nx.from_pandas_edgelist(df, source='Col1', target='Col2')

d = {}
for g in nx.connected_components(G):
    g = sorted(g)
    for x in g:
        d[x] = g[0]

out = pd.Series(d)

output:输出:

A1    A1
B1    A1
D1    A1
E1    A1
C1    C1
dtype: object

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM