简体   繁体   English

相交两个列表并计算每个元素重叠的次数

[英]Intersect two lists and count how many times each element overlaps

I am intersecting two lists with the following code: 我将以下代码与两个列表相交:

def interlist(lst1,lst2): 
    lst3 = list(filter(lambda x: x in lst1, lst2))

    return lst3

The thing, is that I want to count every intersection between lst1 and lst2 . 问题是,我想计算lst1lst2之间的每个交集。 The result should be a dictionary mapping elements to the number of times they overlap in both lists. 结果应该是字典,将元素映射到它们在两个列表中重叠的次数。

Here's a simple solution using collections.Counter and set intersection. 这是一个使用collections.Counter和set相交的简单解决方案。 The idea is to first count occurrences of each element in each list separately; 这个想法是首先分别计算每个列表中每个元素的出现; then, the number of overlaps is the min of the two counts, for each element. 那么,对于每个元素,重叠数是两个计数的min This matches each occurrence in one list with a single occurrence in the other, so the min gives the number of matches that can be made. 这将一个列表中的每个匹配项与另一个列表中的一个匹配项进行匹配,因此min给出了可以进行匹配的次数。 We only need to count elements which occur in both lists, so we take the intersection of the two key-sets. 我们只需要计算两个列表中都出现的元素,所以我们采用两个键集的交集。

If you want to count all matching pairs instead (ie each occurrence in lst1 gets matched with every occurrence in lst2 ), replace min(c1[k], c2[k]) with c1[k] * c2[k] . 如果要计算所有匹配对(即, lst1每个匹配项与lst2每个匹配项匹配),请将min(c1[k], c2[k])替换为c1[k] * c2[k] This counts the number of ways of choosing a pair with one occurrence from lst1 and one from lst2 . 这计算了从lst1选择一个出现而lst2选择一个出现的对的选择方法的数量。

from collections import Counter

def count_intersections(lst1, lst2):
    c1 = Counter(lst1)
    c2 = Counter(lst2)
    return { k: min(c1[k], c2[k]) for k in c1.keys() & c2.keys() }

Example: 例:

>>> lst1 = ['a', 'a', 'a', 'b', 'b', 'c', 'e']
>>> lst2 = ['a', 'b', 'b', 'b', 'd', 'e']
>>> count_intersections(lst1, lst2)
{'b': 2, 'a': 1, 'e': 1}

This solution runs in O(m + n) time and uses at most O(m + n) auxiliary space, where m and n are the sizes of the two lists. 该解决方案以O(m + n)的时间运行,最多使用O(m + n)辅助空间,其中m和n是两个列表的大小。

Per your clarification of: 根据您的澄清:

If lst1 = ["a", "b", "c"] , lst2 = ["a", "a", "a", "b", "b"] then output = {"a": 3, "b": 2} , you can simply do: 如果lst1 = ["a", "b", "c"]lst2 = ["a", "a", "a", "b", "b"]output = {"a": 3, "b": 2} ,您可以简单地执行以下操作:

output = {}
for x in set(lst1):
    cnt = lst2.count(x)
    if cnt > 0:
        output[x] = cnt

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM