Python比较列表列表

Question

I have a list that contains 10**7 lists in the format: 我有一个包含10 ** 7个列表的列表，格式为：

big_list = [[1, 2, 3, 4, 5, 6], [2, 3, 4, 5, 6, 7], [2, 3, 4, 26, 33, 40], [10, 23, 33, 45, 46, 47]]

Every list contains 6 unique ints. 每个列表包含6个唯一的整数。

I need to compare every list to another list: 我需要将每个列表与另一个列表进行比较：

lst = [1, 3, 4, 10, 23, 46]

and return those where list item intersection is less than 3. So newlist would be: 并返回列表项交点小于3的那些。因此，新列表将是：

new_list = [[2, 3, 4, 5, 6, 7], [2, 3, 4, 26, 33, 40]]

At the moment I'm using set intersection, but it takes about 30 seconds to run 目前，我正在使用设置交集，但是运行大约需要30秒

Answer 1

import numpy as np
biglist = [[1, 2, 3, 4, 5, 6], [2, 3, 4, 5, 6, 7], [2, 3, 4, 26, 33, 40], [10, 23, 33, 45, 46, 47]]
oldlist = [1, 3, 4, 10, 23, 46]

b = np.array(biglist)
b[np.array([(b == x).any(axis=1) for x in oldlist]).sum(axis=0) < 3]

returns 回报

array([[ 2,  3,  4,  5,  6,  7],
       [ 2,  3,  4, 26, 33, 40]])

The creation of the numpy array takes some time, but the last line is about twice as fast as the list comprehension with set intersections (for 1e6 lists). 创建numpy数组需要花费一些时间，但是最后一行大约是具有集合交集的列表理解速度的两倍（对于1e6列表）。

EDIT: The following line is even faster than my code above and needs less memory: 编辑：下一行比我上面的代码更快，并且需要更少的内存：

b[reduce(np.add, ((b == x).any(axis=1).astype(np.int) for x in oldlist)) < 3]

Answer 2

>>> big_list = [[1, 2, 3, 4, 5, 6], [2, 3, 4, 5, 6, 7], [2, 3, 4, 26, 33, 40], [10, 23, 33, 45, 46, 47]]
>>> normal = set([1, 3, 4, 10, 23, 46])
>>> [x for x in big_list if len(set(x).intersection(normal)) < 3]
[[2, 3, 4, 5, 6, 7], [2, 3, 4, 26, 33, 40]]

Answer 3

I think a "fast" solution should depend on the specification of your problem. 我认为“快速”解决方案应取决于您问题的规格。 For example, if your reference list is just as short as [1, 3, 4, 10, 23, 46], by sorting every list, we can immediately see that all the lists that start with a num bigger than 10, eg [11, x, x, ...] will NOT have more than 3 common elements with the reference. 例如，如果您的参考列表短于[1、3、4、10、23、46]，则通过对每个列表进行排序，我们可以立即看到所有以大于10的数字开头的列表，例如[ 11，x，x，...]的引用最多包含3个公共元素。 That could already saves a lot of comparisons. 那可能已经节省了很多比较。

Python比较列表列表

问题描述

3 个解决方案

解决方案1
4 2012-06-19 08:21:45

解决方案2
1 已采纳 2012-06-19 08:22:33

解决方案3
0 2012-06-19 13:14:54

Python比较列表列表

问题描述

3 个解决方案

解决方案1 4 2012-06-19 08:21:45

解决方案2 1 已采纳 2012-06-19 08:22:33

解决方案3 0 2012-06-19 13:14:54

解决方案1
4 2012-06-19 08:21:45

解决方案2
1 已采纳 2012-06-19 08:22:33

解决方案3
0 2012-06-19 13:14:54