简体   繁体   English

在Pandas中,如何根据其他列的公共相互关系创建唯一ID?

[英]In Pandas, how to create a unique ID based on the common interrelation of other columns?

I have a dataframe with two IDs columns.我有一个带有两个 ID 列的 dataframe。 I need to set a unique common interrelated ID with te following condition: if either ID1 or ID2 has some of them in common, they must have the same common_ID (ID_3).我需要使用以下条件设置一个唯一的公共关联 ID:如果 ID1 或 ID2 有一些共同点,则它们必须具有相同的 common_ID (ID_3)。

The dataframe looks like: dataframe 看起来像:

df = pd.DataFrame({'ID_1': ['111', '111', '222', '333', '333', '444', '555', '666', '666', '777'],
               'ID_2': ['AAA', 'BBB', 'AAA', 'BBB', 'CCC', 'DDD', 'EEE', 'DDD', 'FFF', 'CCC']})

The desired output should be as follow:所需的 output 应如下所示:

ID_1 ID_1 ID_2 ID_2 ID_3 ID_3
111 111 AAA AAA 1 1
111 111 BBB BBB 1 1
222 222 AAA AAA 1 1
333 333 BBB BBB 1 1
333 333 CCC CCC 1 1
444 444 DDD DDD 2 2
555 555 EEE电子电气设备 3 3
666 666 DDD DDD 2 2
666 666 FFF FFF 2 2
777 777 CCC CCC 1 1
df_output = pd.DataFrame({'ID_1': ['111', '111', '222', '333', '333', '444', '555', '666', '666', '777'],
                      'ID_2': ['AAA', 'BBB', 'AAA', 'BBB', 'CCC', 'DDD', 'EEE', 'DDD', 'FFF', 'CCC'],
                      'ID_3': ['1', '1', '1', '1', '1', '2', '3', '2', '2', '1']})

to clarify the conditions澄清条件

In 1st and 2nd row ID_1 the same, so they must have the same ID_3.在第 1 行和第 2 行 ID_1 相同,因此它们必须具有相同的 ID_3。

The 3rd row has the same ID_2 as 1st row, so its ID_3 must be the same as 1st row = 1.第 3 行的 ID_2 与第 1 行相同,因此其 ID_3 必须与 1st row = 1 相同。

The 4th row has the same ID_2 as 2nd row, that's why it must be set the same ID_3 as 2nd = 1.第 4 行与第 2 行具有相同的 ID_2,这就是为什么必须将其设置为与 2nd = 1 相同的 ID_3。

The 5th row has the same ID_1 as 4th, so ID_3 = 1.第 5 行的 ID_1 与第 4 行相同,因此 ID_3 = 1。

The 6th row has a unique combination of ID_1 and ID_2 at this moment, so it's marked as ID_3 = 2.第 6 行此时有 ID_1 和 ID_2 的唯一组合,因此标记为 ID_3 = 2。

Than 7th row = 3.比第 7 行 = 3。

But 8th has the same ID_2 as 6th, so ID_3 = 2.但是 8th 和 6th 有相同的 ID_2,所以 ID_3 = 2。

and so on等等

I think we can use networkx to solve this:我认为我们可以使用networkx来解决这个问题:

import networkx as nx

G=nx.Graph()
G.add_edges_from(df[['ID_1','ID_2']].to_numpy().tolist())
cc = list(nx.connected_components(G))
L=[dict.fromkeys(b,a) for a, b in enumerate(cc,1)]
d={k: v for d in L for k, v in d.items()}
out = df.assign(ID_3=df['ID_2'].map(d))

print(out)

  ID_1 ID_2  ID_3
0  111  AAA     1
1  111  BBB     1
2  222  AAA     1
3  333  BBB     1
4  333  CCC     1
5  444  DDD     2
6  555  EEE     3
7  666  DDD     2
8  666  FFF     2
9  777  CCC     1

To see connected components:要查看连接的组件:

print(cc)
[{'111', '777', '222', 'AAA', '333', 'BBB', 'CCC'}, 
 {'DDD', 'FFF', '666', '444'}, {'555', 'EEE'}]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas,如何根据多个列的组合创建一个唯一的ID? - In Pandas, how to create a unique ID based on the combination of many columns? 在 Pandas 中基于列重复数据删除创建 unique_id - Create unique_id based in columns deduplication in Pandas 如何根据 pandas 中其他两列的唯一组合获得两列的唯一计数 - How to get unique count of two columns based on unique combination of other two columns in pandas 如何根据另一列中的值创建唯一 ID - How can I create unique id based on the value in the other column Pandas 基于其他列创建多个列 - Pandas create multiple columns based on other columns 如何基于其他数据帧的列使用 pandas 创建新的 dataframe - How to create new dataframe with pandas based on columns of other dataframes Python Pandas:根据其他列中的唯一标识符创建具有最小值的新列 - Python Pandas: create new column with min values based on unique identifiers in other columns pandas,根据其他两列的值创建一个新的唯一标识符列 - pandas, create a new unique identifier column based on values from two other columns 如何根据python中的公共ID值将2列的垂直表pandas转换为水平表 - How to convert vertical pandas table of 2 columns to horizontal table based on common ID value in python Pandas:如何根据其他列值的条件创建对其他列求和的列? - Pandas: How create columns where sum other columns based on conditional of other column values?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM