简体   繁体   English

如何比较 Python 中的两个列表列表并找到匹配值

[英]How can I compare two lists of lists in Python and find matching values

I'm trying to compare two lists based on the index number of each list:我正在尝试根据每个列表的索引号比较两个列表:

list1 = [
    ['1', ['a']],
    ['2', ['b', 'c', 'd']],
    ['3', ['e']],
    ['4', ['f', 'g']],
    ['5', ['h']]
]

list2 = [
    ['1', ['e']],
    ['2', ['f', 'c']],
    ['3', ['h', 'g', 'a', 'd']],
    ['4', ['b']],
    ['5', ['b']],
]

What I would like to do is to compare each row of list1 with all the rows in list2 and return the matching values.我想做的是将 list1 的每一行与 list2 中的所有行进行比较并返回匹配的值。 For instance in this example the desirable outcome would be例如在这个例子中,理想的结果是

1(list1) - 3(list2),
2-2,
2-3,
2-4,
2-5,
3-1, 
4-2, 
4-3

in total 8. And then delete the similar ones, like: 2-4 and 4-2, 1-3 and 3-1.一共8个。然后删除相似的,比如:2-4和4-2,1-3和3-1。

You are looking for the set intersections of the product of your 'labels', where each pair is itself a set too (order doesn't matter, if 2-4 and 4-2 are considered the same).您正在寻找“标签” 乘积集合交集,其中每对本身也是一个集合(如果2-44-2被认为是相同的,则顺序无关紧要)。

Intersections are most efficiently tested with the Python set type , so when building those dictionaries lets convert them to sets up front.使用 Python set type最有效地测试交叉点,因此在构建这些字典时,让我们预先将它们转换为集合。

So we need the unique labels, and a way to look up the associated list for each label.所以我们需要唯一的标签,以及查找每个标签的关联列表的方法。 That's the job for dictionaries, so convert your lists to dictionaries first, and get the union of their keys.这是字典的工作,因此首先将您的列表转换为字典,然后获取它们的键的并集。 Then turn each pairing into a set as well so {'2', '4'} and {'4', '2'} are seen as the same, storing the results in another set.然后将每个配对也变成一个集合,因此{'2', '4'}{'4', '2'}被视为相同,将结果存储在另一个集合中。 Note that 2-2 becomes 2 in this scenario, as a set would store '2' just once.请注意,在这种情况下, 2-2变为2 ,因为一个集合只会存储'2'一次。

Then all we have to do is test if there is an intersection between the two lists associated with the picked combination of keys, and include that combo if there is:然后我们要做的就是测试与选择的键组合相关联的两个列表之间是否存在交集,如果有,则包括该组合:

from itertools import product

dict1 = {k: set(l) for k, l in list1}
dict2 = {k: set(l) for k, l in list2}
keys = dict1.keys() | dict2.keys()  # all unique keys in both

found = {
    frozenset((k1, k2))
    for k1, k2 in product(keys, repeat=2)
    if dict1.get(k1, set()) & dict2.get(k2, set())
}

Demo:演示:

>>> from itertools import product
>>> dict1 = {k: set(l) for k, l in list1}
>>> dict2 = {k: set(l) for k, l in list2}
>>> keys = dict1.keys() | dict2.keys()  # all unique keys in both
>>> {
...     frozenset((k1, k2))
...     for k1, k2 in product(keys, repeat=2)
...     if dict1.get(k1, set()) & dict2.get(k2, set())
... }
{frozenset({'3', '4'}), frozenset({'2'}), frozenset({'3', '5'}), frozenset({'2', '5'}), frozenset({'2', '3'}), frozenset({'2', '4'}), frozenset({'1', '3'})}

If you must have doubled-up references, you can post-process the result:如果您必须有双重引用,您可以对结果进行后处理:

for combo in found:
    try:
        a, b = combo
    except ValueError:  # doesn't contain 2 values, assume 1
        a, = b, = combo
    print(f'{a}-{b}')

The order will vary dependant on the current random hash seed, so you may want to use sorting.顺序将根据当前的随机散列种子而有所不同,因此您可能需要使用排序。 I get this output:我得到这个输出:

3-4
2-2
3-5
2-5
2-3
2-4
1-3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM