简体   繁体   English

从具有最大并集的集合中找到最少集合的最快方法?

[英]Fastest way to find a minimum amount of sets from a set of sets that has the largest union?

Given a set of unique sets, I want to find a minimum amount of sets that has the largest union, ie, the universe.给定一组唯一的集合,我想找到具有最大联合的最小集合,即宇宙。 As an example, let's say we have a set of 20 random sets of integers with different sizes ranging from 1 to 10:例如,假设我们有一组 20 个随机整数集,大小从 1 到 10 不等:

import random

random.seed(99)
length = 20
ss = {frozenset(random.sample(range(100), random.randint(1,10))) for _ in range(length)}
assert len(ss) == 20 # This might be smaller than 20 if frozensets are not all unique

The largest union (universe) is given by最大的联合(宇宙)由下式给出

universe = frozenset().union(*ss)
print(universe)

# frozenset({0, 6, 7, 10, 11, 12, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 25, 
#            26, 27, 29, 31, 32, 34, 37, 39, 40, 42, 43, 45, 46, 47, 48, 49, 
#            51, 52, 53, 54, 56, 59, 60, 62, 63, 64, 66, 67, 68, 69, 75, 76, 
#            77, 78, 79, 80, 81, 84, 86, 87, 88, 89, 91, 92, 93, 95, 97, 98, 99})

Right now I am using a brute-force method to search from the unions of 1 to 20 subsets using itertools.combinations .现在我正在使用一种蛮力方法来使用itertools.combinations从 1 到 20 个子集的联合中进行搜索。 As shown below, the code finds a minimum amount of 17 subsets after 2.95 s.如下所示,代码在 2.95 s 后找到了最少 17 个子集。

from itertools import combinations
from time import time

t0 = time()
n = 1
res = []
found = False
while not found:
    # Get all combinations of n subsets
    all_n_ss = list(combinations(ss, n))
    for n_ss in all_n_ss:
        u = frozenset().union(*n_ss)
        if u == universe:
            res = n_ss
            found = True
            break
    # Add one more subset
    n += 1

print(len(res))
print(res)
print(time()-t0)

# 17
# (frozenset({0, 66, 7, 42, 48, 17, 81, 51, 25, 27}), 
#  frozenset({49, 27, 87, 47}), 
#  frozenset({76, 48, 17, 22, 25, 29, 31}), 
#  frozenset({14}), 
#  frozenset({0, 66, 68, 10, 46, 54, 25, 26, 59}), 
#  frozenset({75, 92, 53, 78}), 
#  frozenset({67, 68, 11, 79, 87, 89, 62}), 
#  frozenset({67, 99, 40, 10, 43, 11, 51, 86, 91, 60}), 
#  frozenset({6, 59, 91, 76, 45, 16, 20, 56, 27, 95}), 
#  frozenset({32, 98, 40, 46, 15, 86, 23, 29, 63}), 
#  frozenset({99, 37, 12, 77, 15, 18, 19, 52, 22, 95}), 
#  frozenset({39, 10, 11, 80, 18, 53, 54, 87}), 
#  frozenset({32, 93}), 
#  frozenset({34}), 
#  frozenset({64, 84, 22}), 
#  frozenset({32, 97, 69, 45, 16, 51, 88, 60}), 
#  frozenset({21}))
# 2.9506494998931885

However, in reality I have a set of 200 sets, which is infeasible for a brute-froce enumeration.但是,实际上我有一组 200 组,这对于粗暴枚举是不可行的。 I want a fast algorithm to find just one optimal solution.我想要一个快速算法来找到一个最佳解决方案。

Integer program solvers are pretty good at this.整数程序求解器在这方面非常擅长。 Sample code in OR-Tools ( pip install ortools ): OR-Tools 中的示例代码( pip install ortools ):

import collections
from ortools.linear_solver import pywraplp


def set_cover(ss):
    solver = pywraplp.Solver.CreateSolver("SCIP")
    solver.Objective().SetMinimization()
    constraints = collections.defaultdict(
        lambda: solver.Constraint(1, solver.infinity())
    )
    variables = []
    for s in ss:
        x = solver.BoolVar(str(s))
        solver.Objective().SetCoefficient(x, 1)
        for e in s:
            constraints[e].SetCoefficient(x, 1)
        variables.append((s, x))
    status = solver.Solve()
    assert status == pywraplp.Solver.OPTIMAL
    return {s for (s, x) in variables if x.solution_value()}


import random


def main():
    random.seed(99)
    length = 200
    ss = {
        frozenset(random.sample(range(100), random.randint(1, 10)))
        for _ in range(length)
    }
    print(set_cover(ss))


if __name__ == "__main__":
    main()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM