對數據集的所有連接節點進行分組

Question

這不是重復的：

注意：pandas ver 0.23.4

假設：數據可以按任何順序排列。

我有一個清單：

L = ['A', 'B', 'C', 'D', 'L', 'M', 'N', 'O']

我也有一個數據幀。 Col1和Col2有幾個相關的列，我希望保留相關的信息。 這些信息是隨意的，所以我沒有填寫。

Col1  Col2  Col1Info  Col2Info  Col1moreInfo  Col2moreInfo
 A     B       x         x            x             x
 B     C
 D     C
 L     M
 M     N
 N     O

我正在嘗試為列表的每個元素執行“搜索和分組”。 例如，如果我們對列表的元素“D”執行搜索，則將返回以下組。

To    From  Col1Info  Col2Info  Col1moreInfo  Col2moreInfo
 A     B       x         x            x             x
 B     C
 D     C

我一直在使用networkx但它是一個非常復雜的包。

Answer 1

您可以使用兩列中的值作為邊來定義圖形，並查找connected_components 。 這是使用NetworkX的一種方式：

import networkx as nx

G=nx.Graph()
G.add_edges_from(df.values.tolist())
cc = list(nx.connected_components(G))
# [{'A', 'B', 'C', 'D'}, {'L', 'M', 'N', 'O'}]

現在說比如你想用D過濾，你可以這樣做：

component = next(i for i in cc if 'B' in i)
# {'A', 'B', 'C', 'D'}

並索引兩列中的值都在component的數據框：

df[df.isin(component).all(1)]

   Col1 Col2
0    A    B
1    B    C
2    D    C

通過生成數據幀列表，可以將上述內容擴展到列表中的所有項目。 然后我們只需要使用L給定項目所在的位置進行索引：

L = ['A', 'B', 'C', 'D', 'L', 'M', 'N', 'O']

dfs = [df[df.isin(i).all(1)] for j in L for i in cc if j in i]
print(dfs[L.index('D')])

   Col1 Col2
0    A    B
1    B    C
2    D    C

print(dfs[L.index('L')])

   Col1 Col2
3    L    M
4    M    N
5    N    O

對數據集的所有連接節點進行分組

問題描述

1 個解決方案

解決方案1
3 已采納 2019-06-20 12:46:35

對數據集的所有連接節點進行分組

問題描述

1 個解決方案

解決方案1 3 已采納 2019-06-20 12:46:35

解決方案1
3 已采納 2019-06-20 12:46:35