查找列表交集中元素数量的快速方法（Python）

Question

有没有更快的方法在Python中计算这个值：

len([x for x in my_list if x in other_list])

我尝试使用集合，因为列表的元素是唯一的，但我注意到没有区别。

len(set(my_list).intersection(set(other_list)))

我正在处理大型名单，所以即使是最轻微的改进也很重要。 谢谢

Answer 1

简单的方法是找到最小长度列表...而不是使用set.intersection ...，例如：

a = range(100)
b = range(50)

fst, snd = (a, b) if len(a) < len(b) else (b, a)
len(set(fst).intersection(snd))

Answer 2

我认为像这样的生成器表达式会很快

sum(1 for i in my_list if i in other_list)

否则， set交叉点就会达到它的速度

len(set(my_list).intersection(other_list))

Answer 3

从https://wiki.python.org/moin/TimeComplexity ，设置两个集合s交集s和t具有时间复杂度：

平均值 - O（min（len（s），len（t））

最糟糕的 - O（len（s）* len（t））

len([x for x in my_list if x in other_list])具有复杂度O（n ^ 2），这相当于set.intersection()的最坏情况。

如果使用set.intersection() ，则只需要将其中一个列表首先转换为集合：

所以len(set(my_list).intersection(other_list)) 平均应该比嵌套列表理解更快。

Answer 4

您可以尝试使用filter功能。 既然你提到你正在处理庞大的列表， ifilter的itertools模块将是一个不错的选择：

from itertools import ifilter
my_set = set(range(100))
other_set = set(range(50))
for item in ifilter(lambda x: x in other_set, my_set):
    print item

Answer 5

我们的想法是首先对两个列表进行排序，然后像我们想要合并它们一样遍历它们，以便找到属于第二个列表的第一个列表中的元素。 这样我们就有了一个O(n logn)算法。

def mycount(l, m):
    l.sort()
    m.sort()
    i, j, counter = 0, 0, 0
    while i < len(l) and j < len(m):
        if l[i] == m[j]:
            counter += 1
            i += 1
        elif l[i] < m[j]:
            i += 1
        else:
            j += 1
    return counter

从本地测试开始，当使用10000元素的列表时，它比len([x for x in a if x in b])快100倍len([x for x in a if x in b]) 。

编辑：

考虑到列表元素是唯一的，公共元素在两个列表的并集中将具有频率2。 当我们对这个联盟进行排序时，他们也会在一起。 所以以下内容也是有效的：

def mycount(l, m):
    s = sorted(l + m)
    return sum(s[i] == s[i + 1] for i in xrange(len(s) - 1))

同样地，我们可以使用一个计数器：

from collections import Counter
def mycount(l, m):
    c = Counter(l)
    c.update(m)
    return sum(v == 2 for v in c.itervalues())

查找列表交集中元素数量的快速方法（Python）

问题描述

5 个解决方案

解决方案1
5 已采纳 2015-03-23 00:41:01

解决方案2
1 2015-03-22 23:47:24

解决方案3
1 2015-03-23 00:28:47

解决方案4
1 2015-03-23 00:51:11

解决方案5
0 2015-03-23 00:18:10

查找列表交集中元素数量的快速方法（Python）

问题描述

5 个解决方案

解决方案1 5 已采纳 2015-03-23 00:41:01

解决方案2 1 2015-03-22 23:47:24

解决方案3 1 2015-03-23 00:28:47

解决方案4 1 2015-03-23 00:51:11

解决方案5 0 2015-03-23 00:18:10

解决方案1
5 已采纳 2015-03-23 00:41:01

解决方案2
1 2015-03-22 23:47:24

解决方案3
1 2015-03-23 00:28:47

解决方案4
1 2015-03-23 00:51:11

解决方案5
0 2015-03-23 00:18:10