从元组列表创建唯一集合

Question

I am trying to create unique clusters of universities that are within 50 miles of each other. 我正在尝试创建彼此之间相距50英里以内的独特大学集群。

I have a dictionary that has a tuple with the universities' names as the keys and the distance between them as the values: 我有一本字典，其中有一个元组，其中大学名称作为键，它们之间的距离作为值：

{('University A', 'University B'): 2546,
 ('University A', 'University C'): 2449,
 ('University A', 'University D'): 5,
 ('University A', 'University E'): 1005,
 ('University B', 'University C'): 32,
 ('University B', 'University D'): 132,
 ('University B', 'University E'): 42,
 ('University C', 'University D'): 532,
 ('University C', 'University E'): 1362}

I am able to filter these to get the pairs of universities that are within 50 miles of each other: 我能够过滤这些，以获得彼此相距50英里之内的两所大学：

('University A', 'University D')
('University B', 'University C')
('University B', 'University E')

How can I iterate through these pairs and create sets of clusters? 我如何遍历这些对并创建集群集？ What I should end up with is a set with Universities A & D and another set with Universities B, C, & E. 我最终应该选择的是大学A和D的一套，以及大学B，C和E的另一套。

There are 100s of universities that I am looking at in reality so the number of pairs is much longer. 实际上，我在看100所大学，因此，配对的数量要长得多。 I am struggling with the creation of new sets within the iteration each time there is a new university cluster. 每当有新的大学集群时，我都在努力在迭代中创建新集合。

Answer 1

Incomplete answer, but hope it shows the idea to be tested and optimized. 答案不完整，但希望它能显示出需要测试和优化的想法。

Filter the keys as a set, then iterate over and use union if any of the pair is in the lookup list, which must be update while iterating. 将键作为一个集合过滤，然后遍历并使用并集（如果对中的任何一对都在查找列表中），该列表必须在迭代时进行更新。 Better to show some code: 最好显示一些代码：

filtered = ([ set(k) for k,v in u.items() if v <= 50 ])
print(filtered) #=> [{'University A', 'University D'}, {'University B', 'University C'}, {'University B', 'University E'}]

lookup_list = filtered[1]
for pair in filtered:
  if any(e in lookup_list for e in pair):
    lookup_list = lookup_list.union(pair)

print(lookup_list)
#=> {'University B', 'University C', 'University E'}

Answer 2

With helpful guidance from @Daniel Mesejo & @Jon Clements and from this post , I ended up using networkx to solve the problem. 在@Daniel Mesejo和@Jon Clements以及这篇文章的有用指导下，我最终使用networkx解决了这个问题。

Starting from a list of tuples clusters , looking like [('University A', 'University B'), ('University A', 'University C'), ...] , I created the graph with: 从元组clusters列表开始，看起来像[('University A', 'University B'), ('University A', 'University C'), ...] ，我创建了带有以下内容的图形：

g = nx.Graph()
for c in clusters :
    g.add_edge(*c)
nx.draw(g)
plt.show()

Then to extract the clusters and give each a unique identifier using a key-value pair in a dictionary where the key is the cluster's number and the values are a list of the nodes (school names) in that cluster: 然后要提取群集，并使用字典中的键值对为每个群集分配唯一的标识符，其中键是群集的数字，值是该群集中节点（学校名称）的列表：

sub_graphs = list(nx.connected_component_subgraphs(g))
n = len(sub_graphs)
clusters = {}
for i in range(n) :
    clusters[i+1] = list(sub_graphs[i].nodes())

And finally to map them back onto the original dataframe: 最后将它们映射回原始数据帧：

def map_cluster(x) :
    for k, v in clusters.items() :
        if x in v :
            return k

df['Cluster'] = df['School Name'].apply(lambda x: map_cluster(x))

I am certain there is a more efficient way to do this and would welcome comments on this approach! 我敢肯定有一种更有效的方法可以做到这一点，并欢迎对此方法发表评论！

从元组列表创建唯一集合

问题描述

2 个解决方案

解决方案1
0 2019-01-05 23:21:48

解决方案2
0 2019-01-07 04:45:30

从元组列表创建唯一集合

问题描述

2 个解决方案

解决方案1 0 2019-01-05 23:21:48

解决方案2 0 2019-01-07 04:45:30

解决方案1
0 2019-01-05 23:21:48

解决方案2
0 2019-01-07 04:45:30