[英]find all the same keys in a list of dicts and find the difference between their values
[英]Find all the keys cluster in a list
我有一個“組合”問題,找到一組不同的密鑰,我試圖找到一個優化的解決方案:
我有這個列表“l”:
l = [[1, 5],
[5, 7],
[4, 9],
[7, 9],
[50, 90],
[100, 200],
[90, 100],
[2, 90],
[7, 50],
[9, 21],
[5, 10],
[8, 17],
[11, 15],
[3, 11]]
每個Id都鏈接到另一個id但可能鏈接到另一個鍵 - 通過另一個鍵 - (見下圖)。 目標是以優化的方式查找屬於同一群集的所有密鑰
想要的結果是:
[{1, 2, 4, 5, 7, 9, 10, 21, 50, 90, 100, 200}, {8, 17}, {3, 11, 15}]
我目前的代碼是:
out = []
while len(l)>0:
first, *rest = l
first = set(first)
lf = -1
while len(first)>lf:
lf = len(first)
print(lf)
rest2 = []
for r in rest:
if len(first.intersection(set(r)))>0:
first |= set(r)
else:
rest2.append(r)
rest = rest2
out.append(first)
l = rest
我得到了之前顯示的結果。 當在200萬行上使用它時,問題就出現了。
有沒有其他方法以優化的方式解決這個問題?
您可以將其視為在圖中查找連接組件的問題:
l = [[1, 5], [5, 7], [4, 9], [7, 9], [50, 90], [100, 200], [90, 100],
[2, 90], [7, 50], [9, 21], [5, 10], [8, 17], [11, 15], [3, 11]]
# Make graph-like dict
graph = {}
for i1, i2 in l:
graph.setdefault(i1, set()).add(i2)
graph.setdefault(i2, set()).add(i1)
# Find clusters
clusters = []
for start, ends in graph.items():
# If vertex is already in a cluster skip
if any(start in cluster for cluster in clusters):
continue
# Cluster set
cluster = {start}
# Process neighbors transitively
queue = list(ends)
while queue:
v = queue.pop()
# If vertex is new
if v not in cluster:
# Add it to cluster and put neighbors in queue
cluster.add(v)
queue.extend(graph[v])
# Save cluster
clusters.append(cluster)
print(*clusters)
# {1, 2, 100, 5, 4, 7, 200, 9, 10, 50, 21, 90} {8, 17} {3, 11, 15}
這是union-find算法/不相交集數據結構的典型用例。 在Python庫AFAIK中沒有實現,但我總是傾向於附近有一個,因為它非常有用......
l = [[1, 5], [5, 7], [4, 9], [7, 9], [50, 90], [100, 200], [90, 100],
[2, 90], [7, 50], [9, 21], [5, 10], [8, 17], [11, 15], [3, 11]]
from collections import defaultdict
leaders = defaultdict(lambda: None)
def find(x):
l = leaders[x]
if l is not None:
leaders[x] = find(l)
return leaders[x]
return x
# union all elements that transitively belong together
for a, b in l:
leaders[find(a)] = find(b)
# get groups of elements with the same leader
groups = defaultdict(set)
for x in leaders:
groups[find(x)].add(x)
print(*groups.values())
# {1, 2, 4, 5, 100, 7, 200, 9, 10, 50, 21, 90} {8, 17} {3, 11, 15}
對於n個節點,其運行時復雜度應該約為O(nlogn),每次都需要登錄步驟才能到達(和更新)領導者。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.