简体   繁体   English

从元组列表中获取唯一值及其后续对

[英]Getting Unique Values and their Following Pairs from a List of Tuples

I have a list of tuples like this:我有一个这样的元组列表:

[
    ('a', 'AA'), # pair 1

    ('d', 'AA'), # pair 2
    ('d', 'a'),  # pair 3
    ('d', 'EE'), # pair 4

    ('b', 'BB'), # pair 5
    ('b', 'CC'), # pair 6
    ('b', 'DD'), # pair 7

    ('c', 'BB'), # pair 8
    ('c', 'CC'), # pair 9
    ('c', 'DD'), # pair 10

    ('c', 'b'),  # pair 11

    ('d', 'FF'), # pair 12

]

Each tuple in the list above shows a similar pair of items (or duplicate items).上面列表中的每个元组显示一对相似的项目(或重复项目)。 I need to create a dictionary in which keys will be one of the unique item from the tuples and values will be lists filled with all the other items that the key occurred in conjunction with.我需要创建一个字典,其中键将是元组中的唯一项之一,值将是列表,其中填充了与该键一起出现的所有其他项。 For example, 'a' is similar to 'AA'(pair 1), which in turn is similar to 'd'(pair 2) and 'd' is similar to 'EE' and 'FF' (pairs 4 and 12).例如,“a”类似于“AA”(第 1 对),后者又类似于“d”(第 2 对),“d”类似于“EE”和“FF”(第 4 对和第 12 对) . same is the case with other items.其他项目也是如此。

My expected output is:我预期的 output 是:

{'a':['AA', 'd', 'EE', 'FF'], 'b':['BB', 'CC', 'DD', 'c']}

According to the tuples, ['a', 'AA', 'd', 'EE', 'FF'] are similar;根据元组, ['a', 'AA', 'd', 'EE', 'FF']是相似的; so, any one of them can be the key, while the remaining items will be it's values.因此,其中任何一项都可以成为关键,而其余项目将成为它的价值。 so, output can also be: {'AA':['a', 'd', 'EE', 'FF'], 'c':['BB', 'CC', 'DD', 'b']} .所以,output 也可以是: {'AA':['a', 'd', 'EE', 'FF'], 'c':['BB', 'CC', 'DD', 'b']} So, key of the output dict can be anything from the duplicate pairs.因此,output 字典的键可以是重复对中的任何内容。

How do I do this for a list with thousands of such tuples in a list?对于列表中包含数千个此类元组的列表,我该如何执行此操作?

You can use 3 dictionaries, one for the output ( out ), one to track the seen values ( seen ), and one to map the equal keys ( mapper ):您可以使用 3 部词典,一部用于 output ( out ),一部用于跟踪可见值 ( seen ),一部用于 map 等键 ( mapper ):

out = {}
seen = {}
mapper = {}

for a, b in l:
    if b in seen:
        out.setdefault(seen[b], []).append(a)
        mapper[a] = seen[b]
    else:
        out.setdefault(mapper.setdefault(a, a), []).append(b)
        seen[b] = a

# remove duplicates
out = {k: list(dict.fromkeys(x for x in v if x!=k))
       for k, v in out.items()}

Output: Output:

{'a': ['AA', 'd', 'EE', 'FF'],
 'b': ['BB', 'CC', 'DD', 'c']}

graph approach图法

Or you might want to approach this using a graph:或者您可能想使用图表来解决这个问题:

import networkx as nx

G = nx.from_edgelist(l)

out = {(x:=list(g))[0]: x[1:] for g in nx.connected_components(G)}

Output: Output:

{'EE': ['d', 'a', 'FF', 'AA'],
 'CC': ['b', 'c', 'BB', 'DD']}

Graph:图形:

在此处输入图像描述

As said in the comment, you seem to want to find the equivalence classes on the given relationship.正如评论中所说,您似乎想找到给定关系的等价类。

ecs = []
for a, b in data:
    a_ec = next((ec for ec in ecs if a in ec), None)
    b_ec = next((ec for ec in ecs if b in ec), None)
    if a_ec:
        if b_ec:
            # Found equivalence classes for both elements, everything is okay
            if a_ec is not b_ec:
                # We only need one of them though
                ecs.remove(b_ec)
                a_ec.update(b_ec)
        else:
            # Add the new element to the found equivalence class       
            a_ec.add(b)
    else:              
        if b_ec:
            # Add the new element to the found equivalence class
            b_ec.add(a)
        else:                                                   
            # First time we see either of these: make a new equivalence class 
            ecs.append({a, b})

# Extract a representative element and construct a dictionary
out = {
    ec.pop(): ec
    for ec in ecs
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM