查找python列表之间的交集/差异

Question

I have two python lists:我有两个 python 列表：

a = [('when', 3), ('why', 4), ('throw', 9), ('send', 15), ('you', 1)]

b = ['the', 'when', 'send', 'we', 'us']

I need to filter out all the elements from a that are similar to those in b.我需要从 a 中过滤掉与 b 中的元素相似的所有元素。 Like in this case, I should get:就像在这种情况下，我应该得到：

c = [('why', 4), ('throw', 9), ('you', 1)]

What should be the most effective way?最有效的方法应该是什么？

Answer 1

A list comprehension will work.列表理解将起作用。

a = [('when', 3), ('why', 4), ('throw', 9), ('send', 15), ('you', 1)]
b = ['the', 'when', 'send', 'we', 'us']
filtered = [i for i in a if not i[0] in b]

>>>print(filtered)
[('why', 4), ('throw', 9), ('you', 1)]

Answer 2

A list comprehension should work:列表理解应该有效：

c = [item for item in a if item[0] not in b]

Or with a dictionary comprehension:或者使用字典理解：

d = dict(a)
c = {key: value for key in d.iteritems() if key not in b}

Answer 3

in is nice, but you should use sets at least for b . in很好，但你应该至少对b使用集合。 If you have numpy, you could also try np.in1d of course, but if it is faster or not, you should probably try.如果你有 numpy，你当然也可以尝试np.in1d ，但如果它更快与否，你可能应该尝试。

# ruthless copy, but use the set...
b = set(b)
filtered = [i for i in a if not i[0] in b]

# with numpy (note if you create the array like this, you must already put
# the maximum string length, here 10), otherwise, just use an object array.
# its slower (likely not worth it), but safe.
a = np.array(a, dtype=[('key', 's10'), ('val', int)])
b = np.asarray(b)

mask = ~np.in1d(a['key'], b)
filtered = a[mask]

Sets also have have the methods difference , etc. which probably are not to useful here, but in general probably are.集合也有方法difference等，这在这里可能没有用，但一般来说可能有用。

Answer 4

As this is tagged with numpy , here is a numpy solution using numpy.in1d benchmarked against the list comprehension:由于这是用numpy标记的，这里是一个使用numpy.in1d以列表理解为基准的 numpy 解决方案：

In [1]: a = [('when', 3), ('why', 4), ('throw', 9), ('send', 15), ('you', 1)]

In [2]: b = ['the', 'when', 'send', 'we', 'us']

In [3]: a_ar = np.array(a, dtype=[('string','|S5'), ('number',float)])

In [4]: b_ar = np.array(b)

In [5]: %timeit filtered = [i for i in a if not i[0] in b]
1000000 loops, best of 3: 778 ns per loop

In [6]: %timeit filtered = a_ar[-np.in1d(a_ar['string'], b_ar)]
10000 loops, best of 3: 31.4 us per loop

So for 5 records the list comprehension is faster.因此，对于 5 条记录，列表理解速度更快。

However for large data sets the numpy solution is twice as fast as the list comprehension:然而，对于大型数据集，numpy 解决方案的速度是列表理解的两倍：

In [7]: a = a * 1000

In [8]: a_ar = np.array(a, dtype=[('string','|S5'), ('number',float)])

In [9]: %timeit filtered = [i for i in a if not i[0] in b]
1000 loops, best of 3: 647 us per loop

In [10]: %timeit filtered = a_ar[-np.in1d(a_ar['string'], b_ar)]
1000 loops, best of 3: 302 us per loop

Answer 5

Try this :试试这个：

a = [('when', 3), ('why', 4), ('throw', 9), ('send', 15), ('you', 1)]

b = ['the', 'when', 'send', 'we', 'us']

c=[]

for x in a:
    if x[0] not in b:
        c.append(x)
print c

Demo: http://ideone.com/zW7mzY演示： http : //ideone.com/zW7mzY

Answer 6

Easy way简单的方法

a = [('when', 3), ('why', 4), ('throw', 9), ('send', 15), ('you', 1)]
b = ['the', 'when', 'send', 'we', 'us']
c=[] # a list to store the required tuples 
#compare the first element of each tuple in with an element in b
for i in a:
    if i[0] not in b:
        c.append(i)
print(c)

Answer 7

使用过滤器：

c = filter(lambda (x, y): False if x in b else True, a)

查找python列表之间的交集/差异

问题描述

7 个解决方案

解决方案1
11 已采纳 2013-02-23 09:30:49

解决方案2
5 2013-02-23 09:28:40

解决方案3
2 2013-02-23 11:11:48

解决方案4
2 2013-02-24 10:37:49

解决方案5
0 2013-02-23 09:30:42

解决方案6
0

解决方案7
-1 2013-02-23 09:33:23

查找python列表之间的交集/差异

问题描述

7 个解决方案

解决方案1 11 已采纳 2013-02-23 09:30:49

解决方案2 5 2013-02-23 09:28:40

解决方案3 2 2013-02-23 11:11:48

解决方案4 2 2013-02-24 10:37:49

解决方案5 0 2013-02-23 09:30:42

解决方案6 0

解决方案7 -1 2013-02-23 09:33:23

解决方案1
11 已采纳 2013-02-23 09:30:49

解决方案2
5 2013-02-23 09:28:40

解决方案3
2 2013-02-23 11:11:48

解决方案4
2 2013-02-24 10:37:49

解决方案5
0 2013-02-23 09:30:42

解决方案6
0

解决方案7
-1 2013-02-23 09:33:23