从 Python 中的列表中删除列表子集的最快方法

Question

假设我有一个如下列表（实际列表要长得多）：

fruits = [['apple', 'pear'],
          ['apple', 'pear', 'banana'],
          ['banana', 'pear'],
          ['pear', 'pineapple'],
          ['apple', 'pear', 'banana', 'watermelon']]

在这种情况下，列表['banana', 'pear'] , ['apple', 'pear']和['apple', 'pear', 'banana']中的所有项目都包含在列表['apple', 'pear', 'banana', 'watermelon'] （项目的顺序无关紧要），所以我想删除['banana', 'pear'] , ['apple', 'pear'] , 和['apple', 'pear', 'banana']因为它们是['apple', 'pear', 'banana', 'watermelon']子集。

我目前的解决方案如下所示。 我首先使用ifilter和imap为每个列表可能具有的超集创建生成器。 然后对于那些确实有超集的情况，我使用compress和imap来删除它们。

from itertools import imap, ifilter, compress

supersets = imap(lambda a: list(ifilter(lambda x: len(a) < len(x) and set(a).issubset(x), fruits)), fruits)


new_list = list(compress(fruits, imap(lambda x: 0 if x else 1, supersets)))
new_list
#[['pear', 'pineapple'], ['apple', 'pear', 'banana', 'watermelon']]

我想知道是否有更有效的方法来做到这一点？

Answer 1

filter(lambda f: not any(set(f) < set(g) for g in fruits), fruits)

Answer 2

我不知道它是否更快，但这更容易阅读（无论如何）：

sets={frozenset(e) for e in fruits}  
us=set()
while sets:
    e=sets.pop()
    if any(e.issubset(s) for s in sets) or any(e.issubset(s) for s in us):
        continue
    else:
        us.add(e)

更新

它很快。 更快仍然是使用for循环。 检查时间：

fruits = [['apple', 'pear'],
        ['apple', 'pear', 'banana'],
        ['banana', 'pear'],
        ['pear', 'pineapple'],
        ['apple', 'pear', 'banana', 'watermelon']]

from itertools import imap, ifilter, compress    

def f1():              
    sets={frozenset(e) for e in fruits}  
    us=[]
    while sets:
        e=sets.pop()
        if any(e.issubset(s) for s in sets) or any(e.issubset(s) for s in us):
            continue
        else:
            us.append(list(e))   
    return us           

def f2():
    supersets = imap(lambda a: list(ifilter(lambda x: len(a) < len(x) and set(a).issubset(x), fruits)), fruits)
    new_list = list(compress(fruits, imap(lambda x: 0 if x else 1, supersets)))
    return new_list

def f3():
    return filter(lambda f: not any(set(f) < set(g) for g in fruits), fruits)

def f4():              
    sets={frozenset(e) for e in fruits}  
    us=[]
    for e in sets:
        if any(e < s for s in sets):
            continue
        else:
            us.append(list(e))   
    return us              

if __name__=='__main__':
    import timeit     
    for f in (f1, f2, f3, f4):
        print f.__name__, timeit.timeit("f()", setup="from __main__ import f, fruits"), f()

在我的Python 2.7上的机器上：

f1 8.09958791733 [['watermelon', 'pear', 'apple', 'banana'], ['pear', 'pineapple']]
f2 15.5085151196 [['pear', 'pineapple'], ['apple', 'pear', 'banana', 'watermelon']]
f3 11.9473619461 [['pear', 'pineapple'], ['apple', 'pear', 'banana', 'watermelon']]
f4 5.87942910194 [['watermelon', 'pear', 'apple', 'banana'], ['pear', 'pineapple']]

Answer 3

@lukaszzenko 发布的答案是正确的，适用于 Python 2。

对于 Python 3，它将给出 object。 下面的代码适用于 Python 3。

list (filter(lambda f: not any(set(f) < set(g) for g in fruits), fruits) )

stackoverflow 中的相关帖子： Python 列表过滤：从列表列表中删除子集

您还可以在下面的链接中找到其他方法：删除另一个子列表中存在的子列表

从 Python 中的列表中删除列表子集的最快方法

问题描述

我想知道是否有更有效的方法来做到这一点？

3 个解决方案

解决方案1
6 2016-02-04 19:56:30

解决方案2
2 已采纳 2016-02-04 19:06:46

解决方案3
0 2022-08-09 23:34:16

从 Python 中的列表中删除列表子集的最快方法

问题描述

我想知道是否有更有效的方法来做到这一点？

3 个解决方案

解决方案1 6 2016-02-04 19:56:30

解决方案2 2 已采纳 2016-02-04 19:06:46

解决方案3 0 2022-08-09 23:34:16

解决方案1
6 2016-02-04 19:56:30

解决方案2
2 已采纳 2016-02-04 19:06:46

解决方案3
0 2022-08-09 23:34:16