简体   繁体   English

在列表集之间查找交集

[英]Find intersection sets between list of sets

The following question is on python 3.6. 以下问题是关于python 3.6的。 Suppose I have lists of sets, for example 例如,假设我有集合列表

L1 = [{2,7},{2,7,8},{2,3,6,7},{1,2,4,5,7}]      
L2 = [{3,6},{1,3,4,6,7},{2,3,5,6,8}]      
L3 = [{2,5,7,8},{1,2,3,5,7,8}, {2,4,5,6,7,8}] 

I need to find all the intersection sets between each element of L1, L2, and L3. 我需要找到L1,L2和L3的每个元素之间的所有交集。 Eg: 例如:

    {2,7}.intersection({3,6}).intersection({2,5,7,8})= empty  
    {2,7}.intersection({3,6}).intersection({1,2,3,5,7,8})= empty  
    {2,7}.intersection({3,6}).intersection({2,4,5,6,7,8})= empty  
    {2,7}.intersection({1,3,4,6,7}).intersection({2,5,7,8})= {7}  
    {2,7}.intersection({1,3,4,6,7}).intersection({1,2,3,5,7,8})= {7}  
    {2,7}.intersection({1,3,4,6,7}).intersection({2,4,5,6,7,8})= {7}

............................... ...............................

If we keep doing like this, we end up with the following set: 如果我们继续这样做,我们最终得到以下集合:

{{empty},{2},{3},{6},{7},{2,3},{2,5},{2,6},{2,8},{3,7},{4,7},{6,7}} {{空},{2},{3},{6},{7},{2,3},{2,5},{2,6},{2,8},{3,7} ,{4,7},{6,7}}

Suppose: 假设:
- I have many lists L1, L2, L3,...Ln. - 我有很多列表L1,L2,L3,... Ln。 And I do not know how many lists I have. 我不知道我有多少名单。
- Each list L1, L2, L3..Ln are big, so I can not load all of them into the memory. - 每个列表L1,L2,L3..Ln都很大,所以我无法将所有列表加载到内存中。

My question is: Is there any way to calculate that set sequentially , eg, calculate between L1 and L2, then using result to calculate with L3, and so on... 我的问题是:有没有办法按顺序计算该集合,例如,计算L1和L2之间,然后使用结果计算L3,依此类推......

You can first calculate all possible intersections between L1 and L2, then calculate the intersections between that set and L3 and so on. 您可以先计算L1和L2之间的所有可能的交点,然后计算该集与L3之间的交点,依此类推。

list_generator = iter([  # some generator that produces your lists 
    [{2,7}, {2,7,8}, {2,3,6,7}, {1,2,4,5,7}],      
    [{3,6}, {1,3,4,6,7}, {2,3,5,6,8}],      
    [{2,5,7,8}, {1,2,3,5,7,8}, {2,4,5,6,7,8}], 
])
# for example, you can read from a file:
# (adapt the format to your needs)
def list_generator_from_file(filename):
    with open(filename) as f:
        for line in f:
            yield list(map(lambda x: set(x.split(',')), line.strip().split('|')))
# list_generator would be then list_generator_from_file('myfile.dat')

intersections = next(list_generator)  # get first list
new_intersections = set()

for list_ in list_generator:
    for old in intersections:
        for new in list_:
            new_intersections.add(frozenset(old.intersection(new)))
    # at this point we don't need the current list any more
    intersections, new_intersections = new_intersections, set()

print(intersections)

Output looks like {frozenset({7}), frozenset({3, 7}), frozenset({3}), frozenset({6}), frozenset({2, 6}), frozenset({6, 7}), frozenset(), frozenset({8, 2}), frozenset({2, 3}), frozenset({1, 7}), frozenset({4, 7}), frozenset({2, 5}), frozenset({2})} , which matches what you have except for the {1,7} set you missed. 输出看起来像{frozenset({7}), frozenset({3, 7}), frozenset({3}), frozenset({6}), frozenset({2, 6}), frozenset({6, 7}), frozenset(), frozenset({8, 2}), frozenset({2, 3}), frozenset({1, 7}), frozenset({4, 7}), frozenset({2, 5}), frozenset({2})} ,除了您错过的{1,7}集之外,它与您的所有内容相匹配。

You can use functools.reduce(set.intersection, sets) to handle variable inputs. 您可以使用functools.reduce(set.intersection, sets)来处理变量输入。 And itertools.product(nested_list_of_sets) to generate combinations with one element from each of several sequences. 并且itertools.product(nested_list_of_sets)与来自几个序列中的每个序列的一个元素生成组合。

By using generator functions ( yield ) and lazy iterators such as itertools.product, you can reduce memory usage by orders of magnitude. 通过使用生成器函数( yield )和惰性迭代器(如itertools.product),可以减少数量级的内存使用量。

import itertools
import functools

nested_list_of_sets = [
    [{2,7}, {2,7,8}, {2,3,6,7}, {1,2,4,5,7}], 
    [{3,6}, {1,3,4,6,7}, {2,3,5,6,8}],
    [{2,5,7,8}, {1,2,3,5,7,8}, {2,4,5,6,7,8}],
]

def find_intersections(sets):
    """Take a nested sequence of sets and generate intersections"""
    for combo in itertools.product(*sets):
        yield (combo, functools.reduce(set.intersection, combo))

for input_sets, output_set in find_intersections(nested_list_of_sets):
    print('{:<55}  ->   {}'.format(repr(input_sets), output_set))

Output is 输出是

({2, 7}, {3, 6}, {8, 2, 5, 7})                           ->   set()
({2, 7}, {3, 6}, {1, 2, 3, 5, 7, 8})                     ->   set()
({2, 7}, {3, 6}, {2, 4, 5, 6, 7, 8})                     ->   set()
({2, 7}, {1, 3, 4, 6, 7}, {8, 2, 5, 7})                  ->   {7}
({2, 7}, {1, 3, 4, 6, 7}, {1, 2, 3, 5, 7, 8})            ->   {7}
({2, 7}, {1, 3, 4, 6, 7}, {2, 4, 5, 6, 7, 8})            ->   {7}
({2, 7}, {2, 3, 5, 6, 8}, {8, 2, 5, 7})                  ->   {2}
({2, 7}, {2, 3, 5, 6, 8}, {1, 2, 3, 5, 7, 8})            ->   {2}
# ... etc

Online demo on repl.it repl.it上的在线演示

This may be what you are looking for: 这可能是您正在寻找的:

res = {frozenset(frozenset(x) for x in (i, j, k)): i & j & k \
       for i in L1 for j in L2 for k in L3}

Explanation 说明

  • frozenset is required because set is not hashable. frozenset是必需的,因为set不可清除。 Dictionary keys must be hashable. 字典键必须是可清除的。
  • Cycle through every length-3 combination of items in L1, L2, L3. 循环遍历L1,L2,L3中每个长度为3的项目组合。
  • Calculate intersection via & operation, equivalent to set.intersection . 计算交叉点via & operation,相当于set.intersection

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM