在python列表列表中對相關項進行分組

Question

我有以下形式的清單：

list = [ [['item1'], ['property1','property2','property3']], [['item2'],['property1', 'property4']], ..., [['itemN'],['property5']]]

我要構建另一個列表列表，將上述所有項目與它們共享至少一個屬性的項目組合在一起。 例如：

new_list = [['item1','item2'], .., ['itemN']]

請注意，即使項目間接共享屬性，也應將它們分組在一起。 如果例如item1與item2具有公共屬性，而item2與item3具有公共屬性，而item1與item3不共享任何屬性，則它們仍應歸為一組。

我的嘗試是在原始列表中添加一個布爾值（以便不需要時不再重復），並使用以下功能：

list = [ [['item1'], ['property1','property2','property3'], True], [['item2'],['property1', 'property4'], True], [['itemN'],['property5'], True]]

def group_correlates(list):
    result = []
    for i, entry in enumerate(list):
        correlates = []
        items = entry[0]
        properties = entry[1]
        if entry[2]: # if not already grouped (True)
            correlates.append(items)
        for j, other_entry in enumerate(list):
            flag = other_entry[2]
            if not i == j:
                if flag:
                    other_properties = other_entry[1]
                    other_items = other_entry[0]
                    for property in properties:
                        if property in other_properties:
                            other_entry[2] = False # do not visit again
                            correlates.append(other_items)
                            result.append(correlates)
    return result

但我明白了：

[[['item1'], ['item2']], [['item1']]]

即使我能做到這一點，我也肯定有一種更優雅的方法來完成同樣的事情

Answer 1

為什么不使用dict然后使用itertools模塊中的groupby ？

這是如何執行此操作的示例：

from itertools import groupby

data = [[['item1'], ['property1','property2','property3']], [['item2'],['property1', 'property4']], [['itemN'],['property5']]]

aa = {}
for k, v in data:
    for j in v:
        try:
            aa[j] += k
        except KeyError:
            aa[j] = k


new_list = [k for k,_ in groupby(sorted(aa.values()), lambda x: x)]
print(new_list)

或者，您可以使用collections模塊中的defaultdict ：

from collections import defaultdict
from itertools import groupby

data = [[['item1'], ['property1','property2','property3']], [['item2'],['property1', 'property4']], [['itemN'],['property5']]]

bb = defaultdict(None)

for k, v in data:
    for j in v:
        bb[j] = k


new_list = [k for k,_ in groupby(sorted(bb.values()), lambda x: x)]
print(new_list)

兩者都將輸出：

[['item1', 'item2'], ['item2'], ['itemN']]

Answer 2

首先將您的列表轉換成如上所述的字典。

list1 = [ [['item1'], ['property1','property2','property3']], 
          [['item2'], ['property1', 'property4']],
          [['item3'], ['property5', 'property6']]
        ]

dict1 = {item[0][0]: item[1] for item in list1}

然后：

new_list = []

for key in dict1:
    target = dict1[key]
    for k, v in dict1.items():
        if k != key and len(set(target).intersection(set(v))) != 0:
            new_list.append([key, k])
    new_list = [sorted(i) for i in new_list] # sort sublists
    new_list = [list(t) for t in set(map(tuple, new_list))] # remove dupes

flat = [item for sublist in new_list for item in sublist] # flatten list
unique = list(set(dict1.keys()).difference(set(flat)))
new_list.append(unique) # add unique keys

new_list
Out[76]: [['item1', 'item2'], ['item3']]

Answer 3

“二分”主要是術語。 要點是要找到連接的子圖。

將所有嵌套列表放入“打開”列表中……您需要處理此列表中的所有內容。 當它為空時，您就完成了。 開始一個新的子圖列表-這是您提到的“列表列表”。

將項目列表和屬性列表初始化為空列表。

選擇一個項目並將其放入子圖列表。 現在，在屬性和項目之間切換，直到沒有添加任何內容：

對於您剛剛添加的每個新項目（第一次只有該初始項目），（在屬性列表中）添加該項目的所有屬性。 保留哪些屬性是新的屬性列表。
從“打開”列表中刪除這些項目（及其屬性）。
對於剛添加的每個屬性，（在項目列表中）添加具有該屬性的每個項目。 保留剛剛添加的項目的列表。
重復步驟1-3，直到未添加任何新內容為止。

此時，項目列表和屬性列表描述了一個封閉的子圖。 將該對添加到主子圖列表中。

返回，將項目和屬性列表重置為空列表，然后從新的初始項目開始。 繼續執行此操作，直到用盡所有項目。 “打開”列表現在為空； 現在，所有項目和屬性都顯示在子圖列表中。

在python列表列表中對相關項進行分組

問題描述

3 個解決方案

解決方案1
1 2017-05-30 23:42:30

解決方案2
1 已采納 2017-05-31 00:09:51

解決方案3
0 2017-05-30 23:40:41

在python列表列表中對相關項進行分組

問題描述

3 個解決方案

解決方案1 1 2017-05-30 23:42:30

解決方案2 1 已采納 2017-05-31 00:09:51

解決方案3 0 2017-05-30 23:40:41

解決方案1
1 2017-05-30 23:42:30

解決方案2
1 已采納 2017-05-31 00:09:51

解決方案3
0 2017-05-30 23:40:41