简体   繁体   English

检查集合列表是否具有任何包含关系的最快方法

[英]Fastest way to check if a list of sets has any containment relationship

I hava a list of 10,000 random sets with different lengths:我有一个包含 10,000 个不同长度的随机集的列表:

import random

random.seed(99)
lst = [set(random.sample(range(1, 10000), random.randint(1, 1000))) for _ in range(10000)]

I want to know the fastest way to check if there is any set that is a subset of another set (or equivalently if there is any set that is a superset of another set).我想知道最快的方法来检查是否有任何集合是另一个集合的子集(或者等效地,如果有任何集合是另一个集合的超集)。 Right now I am using the following very basic code:现在我正在使用以下非常基本的代码:

def any_containment(lst):
    checked_sets = []
    for st in lst:
        if any(st.issubset(s) for s in checked_sets):
            return True
        else:
            checked_sets.append(st)
    return False

%timeit any_containment(lst)
# 12.3 ms ± 230 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Clearly, my code is not utilizing previous information when checking containment in each iteration.显然,我的代码在每次迭代中检查包含时都没有利用以前的信息。 Can anyone suggest the fastest way to do this?任何人都可以建议最快的方法吗?

Seems to be faster to sort by length and then try small sets as subset first (and for each, try large sets as superset first).按长度排序似乎更快,然后首先尝试将小集合作为子集(对于每个集合,首先尝试将大集合作为超集)。 Times in ms from ten cases, data generated like you did but without seeding:十个案例中以毫秒为单位的时间,像您一样生成的数据但没有播种:

agree yours   mine  ratio  result
True   2.24   2.98   0.75  True
True 146.25   3.10  47.19  True
True 121.66   2.90  41.91  True
True   0.21   2.73   0.08  True
True  37.01   2.82  13.10  True
True   5.86   3.13   1.87  True
True  54.61   3.14  17.40  True
True   0.86   2.81   0.30  True
True 182.51   3.06  59.60  True
True 192.93   2.73  70.65  True

Code ( Try it online! ):代码( 在线尝试! ):

import random
from timeit import default_timer as time

def original(lst):
    checked_sets = []
    for st in lst:
        if any(st.issubset(s) for s in checked_sets):
            return True
        else:
            checked_sets.append(st)
    return False

def any_containment(lst):
    remaining = sorted(lst, key=len, reverse=True)
    while remaining:
        s = remaining.pop()
        if any(s <= t for t in remaining):
            return True
    return False

for _ in range(10):
    lst = [set(random.sample(range(1, 10000), random.randint(1, 1000))) for _ in range(10000)]
    t0 = time()
    expect = original(lst)
    t1 = time()
    result = any_containment(lst)
    t2 = time()
    te = t1 - t0
    tr = t2 - t1
    print(result == expect, '%6.2f ' * 3 % (te*1e3, tr*1e3, te/tr), expect)

Improvement改进

The following seems further ~20% faster.以下似乎快了约 20%。 Instead of first comparing the smallest set with potentially all larger sets before giving even just the second -smallest a chance, this does give other small sets an early chance.在给第二小的机会之前,不是首先将最小的集合与可能所有更大的集合进行比较,而是给其他小集合一个早期的机会。

def any_containment(lst):
    sets = sorted(lst, key=len)
    for i in range(1, len(sets)):
        for s, t in zip(sets, sets[-i:]):
            if s <= t:
                return True
    return False

Comparison with my old solution ( Try it online! ):与我的旧解决方案比较( 在线试用! ):

agree  old    new   ratio  result
True   3.13   2.46   1.27  True
True   3.36   3.31   1.02  True
True   3.10   2.49   1.24  True
True   2.72   2.43   1.12  True
True   2.86   2.35   1.21  True
True   2.65   2.47   1.07  True
True   5.24   4.29   1.22  True
True   3.01   2.35   1.28  True
True   2.72   2.28   1.19  True
True   2.80   2.45   1.14  True

Yet another idea另一个想法

A shortcut could be to first collect the union of all single-element sets, and check whether that intersects with any other set (either without sorting them, or again from largest to smallest after sorting).一个捷径可能是首先收集所有单元素集的并集,然后检查它是否与任何其他集相交(不对它们进行排序,或者在排序后再次从大到小)。 That likely suffices.这可能就足够了。 If not, then proceed as previously, but without the single-element sets.如果不是,则像以前一样进行,但不使用单元素集。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 最快的检查方式是字符串包含列表中的任何单词 - Fastest way to check does string contain any word from list 检查列表中的任何项目是否为数字类型的最快方法? - Fastest way to check if any item in a list is a numeric type? 从逻辑矩阵到集合列表的最快方式 - Fastest way from logic matrix to list of sets 检查集合/列表中至少一个元素是否在列表/集合集合中的每个元素中的最快方法 - fastest way to check if atleast one element in set/list is in each element in a collection of lists/sets 将列表列表与集合列表进行比较的最快方法 - Fastest way to compare list of lists against list of sets 检查列表列表中是否存在列表的最快方法 - Fastest way to check if a list is present in a list of lists 在不导入库和使用集合的情况下删除列表中重复项的最快方法 - Fastest way to remove duplicates in a list without importing libraries and using sets 检查文件是否包含字符串列表中任何字符串的最快方法 - Fastest way to check whether a file contains any string from a list of strings Python - 检查字符串是否包含列表中任何项目中的特定字符的最快方法 - Python - Fastest way to check if a string contains specific characters in any of the items in a list 检查列表中的任何点是否位于两个固定点A和B中的最快方法 - Fastest way to check if any point from a list is in bwteen two fixed points A and B
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM