如何從pandas數據幀創建邊緣列表？

Question

我有一個形式的熊貓數據幀（df） -

    Col1
A  [Green,Red,Purple]
B  [Red, Yellow, Blue]
C  [Brown, Green, Yellow, Blue]

我需要將其轉換為邊緣列表，即表單的數據框：

Source    Target    Weight
  A         B         1
  A         C         1
  B         C         2

編輯請注意，新數據框的行數等於可能的成對組合的總數。 此外，要計算“權重”列，我們只需找到兩個列表之間的交集。 例如，對於B＆C，元素共享兩種顏色：藍色和黃色。 因此，相應行的“權重”為2。

最快的方法是什么？ 原始數據框包含大約28,000個元素。

Answer 1

嘗試這個。 工作不是很整潔。 PS：最后輸出你可以調整它，我沒有刪除列並更改列名稱

import pandas as pd 
df=pd.DataFrame({"Col1":[['Green','Red','Purple'],['Red', 'Yellow', 'Blue'],['Brown', 'Green', 'Yellow', 'Blue']],"two":['A','B','C']})
df=df.set_index('two')
del df.index.name
from itertools import combinations
DF=pd.DataFrame()
dict1=df.T.to_dict('list')
DF=pd.DataFrame(data=[x for x in combinations(df.index, 2)])
DF['0_0']=DF[0].map(df['Col1'])
DF['1_1']=DF[1].map(df['Col1'])
DF['Weight']=DF.apply(lambda x : len(set(x['0_0']).intersection(x['1_1'])),axis=1)



DF
Out[174]: 
   0  1                   0_0                           1_1  Weight
0  A  B  [Green, Red, Purple]           [Red, Yellow, Blue]       1
1  A  C  [Green, Red, Purple]  [Brown, Green, Yellow, Blue]       1
2  B  C   [Red, Yellow, Blue]  [Brown, Green, Yellow, Blue]       2

Answer 2

首先，從數據幀開始：

from itertools import combinations

df = pd.DataFrame({
        'Col1': [['Green','Red','Purple'], 
                 ['Red', 'Yellow', 'Blue'], 
                 ['Brown', 'Green', 'Yellow', 'Blue']]
     }, index=['A', 'B', 'C'])

df['Col1'] = df['Col1'].apply(set)    
df

                           Col1
A          {Purple, Red, Green}
B           {Red, Blue, Yellow}
C  {Green, Yellow, Blue, Brown}

Col1每個列表都已轉換為集合以有效地查找聯合。 接下來，我們將使用itertools.combinations創建df中所有行的成對組合：

df1 = pd.DataFrame(
    data=list(combinations(df.index.tolist(), 2)), 
    columns=['Src', 'Dst'])

df1

  Src Dst
0   A   B
1   A   C
2   B   C

現在，應用一個函數來獲取集合的並集並找到它的長度。 Src和Dst列充當df的查找。

df1['Weights'] = df1.apply(lambda x: len(
    df.loc[x['Src']]['Col1'].intersection(df.loc[x['Dst']]['Col1'])), axis=1)
df1

  Src Dst  Weights
0   A   B        1
1   A   C        1
2   B   C        2

我建議在一開始就設置轉換。 每次動態將列表轉換為集合都是昂貴且浪費的。

為了獲得更高的速度，您可能還希望將這些集合復制到新數據幀中的兩列中，因為不斷地調用df.loc會使速度降低一個檔次。

Answer 3

得到一組集合
使用np.triu_indices獲得表示所有組合的成對索引
使用&運算符來獲得成對的交叉點並通過理解獲得長度

c = df.Col1.apply(set).values

i, j = np.triu_indices(c.size, 1)

pd.DataFrame(dict(
        Source=df.index[i],
        Target=df.index[j],
        Weight=[len(s) for s in c[i] & c[j]]
    ))

  Source Target  Weight
0      A      B       1
1      A      C       1
2      B      C       2

如何從pandas數據幀創建邊緣列表？

問題描述

3 個解決方案

解決方案1
5 2017-07-09 02:16:51

解決方案2
5 已采納 2017-07-09 02:40:40

解決方案3
2 2017-07-09 05:51:12

如何從pandas數據幀創建邊緣列表？

問題描述

3 個解決方案

解決方案1 5 2017-07-09 02:16:51

解決方案2 5 已采納 2017-07-09 02:40:40

解決方案3 2 2017-07-09 05:51:12

解決方案1
5 2017-07-09 02:16:51

解決方案2
5 已采納 2017-07-09 02:40:40

解決方案3
2 2017-07-09 05:51:12