简体   繁体   English

计算集合列表中所有组合的交叉点

[英]Counting intersections for all combinations in a list of sets

I have a collection of sets. 我有一组集合。 I want to find the number of items that are found only in the intersection for each combination of sets. 我想找到每个组合组合中仅在交叉点中找到的项目数。 I'm basically want to do the same thing as creating the numbers in a Venn diagram. 我基本上想做与在维恩图中创建数字相同的事情。

An basic example might make it clearer. 一个基本的例子可能会让它更清晰。

a = set(1,2,5,10,12)
b = set(1,2,6,9,12,15)
c = set(1,2,7,8,15)

I should end up with a count of items found only in: 我最终应该只找到以下项目:

  • a 一个
  • b b
  • c C
  • the intersection of a and b a和b的交点
  • the intersection of a and c a和c的交集
  • the intersection of b and c b和c的交点
  • the intersection of a, b and c a,b和c的交集

A non-extensible way of doing this is 这是一种不可扩展的方法

num_a = len(a - b - c)  # len(set([5,10])) -> 2
num_b = len(b - a - c)  # len(set([6,9])) -> 2
num_c = len(c - a - b)  # len(set([7,8])) -> 2

num_ab = len((a & b) - c)  # 1
num_ac = len((a & c) - b)  # 0
num_bc = len((b & c) - a)  # 1

num_abc = len(a & b & c)  # 2

While this works for 3 sets my collection of sets is not static. 虽然这适用于3集,但我的集合集并不是静态的。

IIUC, something like this should work: IIUC,这样的事情应该有效:

from itertools import combinations

def venn_count(named_sets):
    names = set(named_sets)
    for i in range(1, len(named_sets)+1):
        for to_intersect in combinations(sorted(named_sets), i):
            others = names.difference(to_intersect)
            intersected = set.intersection(*(named_sets[k] for k in to_intersect))
            unioned = set.union(*(named_sets[k] for k in others)) if others else set()
            yield to_intersect, others, len(intersected - unioned)


ns = {"a": {1,2,5,10,12}, "b": {1,2,6,9,12,15}, "c": {1,2,7,8,15}}
for intersected, unioned, count in venn_count(ns):
    print 'len({}{}) = {}'.format(' & '.join(sorted(intersected)),
                                  ' - ' + ' - '.join(sorted(unioned)) if unioned else '',
                                  count)

which gives 这使

len(a - b - c) = 2
len(b - a - c) = 2
len(c - a - b) = 2
len(a & b - c) = 1
len(a & c - b) = 0
len(b & c - a) = 1
len(a & b & c) = 2

You can use itertools.combinations to get all the possible combinations. 您可以使用itertools.combinations获取所有可能的组合。 http://docs.python.org/2/library/itertools.html http://docs.python.org/2/library/itertools.html

I'd try using bit masks: 我尝试使用位掩码:

sets = [
    set([1,2,5,10,12]),
    set([1,2,6,9,12,15]),
    set([1,2,7,8,15]),
]

d = {}

for n, s in enumerate(sets):
    for i in s:
        d[i] = d.get(i, 0) | (1 << n)

for mask in range(1, 2**len(sets)):
    cnt = sum(1 for x in d.values() if x & mask == mask)
    num = ','.join(str(j) for j in range(len(sets)) if mask & (1 << j))
    print 'number of items in set(s) %s = %d' % (num, cnt)

Results for your input: 您输入的结果:

number of items in set(s) 0 = 5
number of items in set(s) 1 = 6
number of items in set(s) 0,1 = 3
number of items in set(s) 2 = 5
number of items in set(s) 0,2 = 2
number of items in set(s) 1,2 = 3
number of items in set(s) 0,1,2 = 2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM