Python：兩個列表列表的交集

Question

我有一個列表A列表和一個列表B列表，其中A和B有許多相同的子列表。

從B和A獲取唯一子列表的最佳方法是什么？

A = [['foo', 123], ['bar', np.array(range(10))], ['baz', 345]]
B = [['foo', 123], ['bar', np.array(range(10))], ['meow', 456]]

=> A = [['foo', 123], ['bar', np.array(range(10))], ['baz', 345], ['meow', 456]]

我試過了：

A += [b for b in B if b not in A]

但這給了我一個ValueError說使用any()或all() 。 我是否真的必須逐個元素地測試A每個子列表中B每個子列表？

ERROR: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Answer 1

通常，您可以使用多種方法之一來按順序或不按順序統一列表或多個列表。

這是一種統一兩個不維護順序的列表的方法：

>>> A=[1,3,5,'a','c',7]
>>> B=[1,2,3,'c','b','a',6]
>>> set(A+B)
set(['a', 1, 'c', 3, 5, 6, 7, 2, 'b'])

這是一種維護秩序的方法：

>>> seen=set()
>>> [e for e in A+B if e not in seen and (seen.add(e) or True)]
[1, 3, 5, 'a', 'c', 7, 2, 'b', 6]

問題是所有元素都必須是可以使用這些方法的：

>>> set([np.array(range(10)), 22])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'numpy.ndarray'

解決這個問題的一種方法是使用每個元素的repr ：

>>> set([repr(e) for e in [np.array(range(10)), 22]])
set(['22', 'array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])'])

或者使用冷凍套裝：

>>> set(frozenset(e) for e in [np.array(range(10)), np.array(range(2))])
set([frozenset([0, 1]), frozenset([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])])

在您的情況下，凍結集方法不適用於列表列表：

>>> set(frozenset(e) for e in [[np.array(range(10)), np.array(range(2))],[np.array(range(5))
]])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <genexpr>
TypeError: unhashable type: 'numpy.ndarray'

所以你需要使用扁平列表。

如果子列表的repr是其不公平的明確證據，您可以這樣做：

from collections import OrderedDict
import numpy as np

A = [['foo', 123], ['bar', np.array(range(10))], ['baz', 345]]
B = [['foo', 123], ['bar', np.array(range(10))], ['meow', 456]]

seen=OrderedDict((repr(e),0) for e in B)

newA=[]
for e in A+B:
    key=repr(e)
    if key in seen:
        if seen[key]==0:
            newA.append(e)
            seen[key]=1
    else:
        seen[key]=1
        newA.append(e)

print newA
# [['foo', 123], ['bar', array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])], ['baz', 345], ['meow', 456]]

由於repr函數返回一個字符串， eval函數可以使用該字符串重新創建列表，這是非常明確的測試，但我不能絕對肯定地說。 這取決於您的列表中的內容。

例如，lambda的repr無法重新創建lambda：

>>> repr(lambda x:x)
'<function <lambda> at 0x10710ec08>'

但是'<function <lambda> at 0x10710ec08>'的字符串值仍然是絕對唯一的，因為0x10710ec08部分是lambda內存中的地址（反正在cPython中）。

你也可以做我上面提到的 - 在freezeset中使用flattened列表作為你所看到或不見的簽名：

def flatten(LoL):
    for el in LoL:
        if isinstance(el, collections.Iterable) and not isinstance(el, basestring):
            for sub in flatten(el):
                yield sub
        else:
            yield el      
newA=[]    
seen=set()
for e in A+B:
    fset=frozenset(flatten(e))
    if fset not in seen:
        newA.append(e)
        seen.add(fset)

print newA        
# [['foo', 123], ['bar', array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])], ['baz', 345], ['meow', 456]]

因此，如果你有一些奇怪的對象，這些對象在A和B中都是不可取的和奇怪的，非唯一的repr字符串對象 - 你運氣不好。 舉個例子，其中一個方法應該可行。

Answer 2

你可以做到

import numpy as np

A = [['foo', 123], ['bar', np.array(range(10))], ['baz', 345]]
B = [['foo', 123], ['bar', np.array(range(10))], ['meow', 456]]

res = set().update(tuple(x) for x in A).update(tuple(x) for x in B)

除了 np.array項目，這是不可取的...不知道如何處理這些。

Python：兩個列表列表的交集

問題描述

2 個解決方案

解決方案1
1 已采納 2014-02-20 22:37:36

解決方案2
0 2014-02-20 22:09:38

Python：兩個列表列表的交集

問題描述

2 個解決方案

解決方案1 1 已采納 2014-02-20 22:37:36

解決方案2 0 2014-02-20 22:09:38

解決方案1
1 已采納 2014-02-20 22:37:36

解決方案2
0 2014-02-20 22:09:38