简体   繁体   English

使用动态编程从 Python 上的子集总和问题中获取所有子集

[英]Getting all subsets from subset sum problem on Python using Dynamic Programming

I am trying to extract all subsets from a list of elements which add up to a certain value.我正在尝试从元素列表中提取所有子集,这些元素的总和为某个值。

Example -例子 -

  • List = [1,3,4,5,6]列表 = [1,3,4,5,6]
  • Sum - 9总和 - 9
  • Output Expected = [[3,6],[5,4]]预期输出 = [[3,6],[5,4]]

Have tried different approaches and getting the expected output but on a huge list of elements it is taking a significant amount of time.尝试了不同的方法并获得了预期的输出,但在大量元素上,这需要大量时间。 Can this be optimized using Dynamic Programming or any other technique.这可以使用动态编程或任何其他技术进行优化。

Approach-1方法一

def subset(array, num):
    result = []
    def find(arr, num, path=()):
        if not arr:
            return
        if arr[0] == num:
            result.append(path + (arr[0],))
        else:
            find(arr[1:], num - arr[0], path + (arr[0],))
            find(arr[1:], num, path)
    find(array, num)
    return result

numbers = [2, 2, 1, 12, 15, 2, 3]
x = 7
subset(numbers,x)

Approach-2方法二

def isSubsetSum(arr, subset, N, subsetSize, subsetSum, index , sum):
    global flag
    if (subsetSum == sum):
        flag = 1
        for i in range(0, subsetSize):
            print(subset[i], end = " ")
        print("")
    else:
        for i in range(index, N):
            subset[subsetSize] = arr[i]
            isSubsetSum(arr, subset, N, subsetSize + 1, 
                        subsetSum + arr[i], i + 1, sum)

If you want to output all subsets you can't do better than a sluggish O(2^n) complexity, because in the worst case that will be the size of your output and time complexity is lower-bounded by output size (this is a known NP-Complete problem).如果你想输出所有的子集,你不能比缓慢的 O(2^n) 复杂度做得更好,因为在最坏的情况下,输出的大小和时间复杂度受输出大小的限制(这是一个已知的 NP 完全问题)。 But, if rather than returning a list of all subsets, you just want to return a boolean value indicating whether achieving the target sum is possible, or just one subset summing to target (if it exists), you can use dynamic programming for a pseudo-polynomial O(nK) time solution, where n is the number of elements and K is the target integer.但是,如果您不想返回所有子集的列表,而只想返回一个布尔值,该值指示是否可以实现目标总和,或者只返回一个子集总和到目标(如果存在),则可以使用动态编程进行伪-多项式 O(nK) 时间解,其中 n 是元素数,K 是目标整数。

The DP approach involves filling in an (n+1) x (K+1) table, with the sub-problems corresponding to the entries of the table being: DP 方法涉及填写一个 (n+1) x (K+1) 表,表中条目对应的子问题为:

DP[i][k] = subset(A[i:], k) for 0 <= i <= n, 0 <= k <= K

That is, subset(A[i:], k) asks, 'Can I sum to (little) k using the suffix of A starting at index i?'也就是说,subset(A[i:], k) 询问,“我可以使用从索引 i 开始的 A 的后缀求和到(小)k 吗?” Once you fill in the whole table, the answer to the overall problem, subset(A[0:], K) will be at DP[0][K]一旦你填满了整个表格,整个问题的答案,subset(A[0:], K) 将在 DP[0][K]

The base cases are for i=n: they indicate that you can't sum to anything except for 0 if you're working with the empty suffix of your array基本情况适用于 i=n:它们表示如果您使用数组的空后缀,则不能求和除 0 以外的任何值

subset(A[n:], k>0) = False, subset(A[n:], k=0) = True

The recursive cases to fill in the table are:填表的递归情况是:

subset(A[i:], k) = subset(A[i+1:, k) OR (A[i] <= k AND subset(A[i+i:], k-A[i])) 

This simply relates the idea that you can use the current array suffix to sum to k either by skipping over the first element of that suffix and using the answer you already had in the previous row (when that first element wasn't in your array suffix), or by using A[i] in your sum and checking if you could make the reduced sum kA[i] in the previous row.这只是将以下想法联系起来,即您可以通过跳过该后缀的第一个元素并使用您在前一行中已有的答案(当第一个元素不在您的数组后缀中时)来使用当前数组后缀求和为 k ),或者通过在总和中使用A[i]并检查是否可以在前一行中减少总和kA[i] Of course, you can only use the new element if it doesn't itself exceed your target sum.当然,只有当新元素本身不超过您的目标总和时,您才能使用它。

ex: subset(A[i:] = [3,4,1,6], k = 8) would check: could I already sum to 8 with the previous suffix (A[i+1:] = [4,1,6])?例如:subset(A[i:] = [3,4,1,6], k = 8) 会检查:我是否已经用前一个后缀 (A[i+1:] = [4,1 ,6])? No. Or, could I use the 3 which is now available to me to sum to 8?否。或者,我可以使用现在可用的 3 来求和为 8 吗? That is, could I sum to k = 8 - 3 = 5 with [4,1,6]?也就是说,我可以用 [4,1,6] 求和 k = 8 - 3 = 5 吗? Yes.是的。 Because at least one of the conditions was true, I set DP[i][8] = True因为至少有一个条件为真,所以我设置 DP[i][8] = True

Because all the base cases are for i=n, and the recurrence relation for subset(A[i:], k) relies on the answers to the smaller sub-problems subset(A[i+i:],...), you start at the bottom of the table, where i = n, fill out every k value from 0 to K for each row, and work your way up to row i = 0, ensuring you have the answers to the smaller sub-problems when you need them.因为所有的基本情况都是针对 i=n,而子集(A[i:], k) 的递推关系依赖于较小子问题子集(A[i+i:],...) 的答案,您从表的底部开始,其中 i = n,为每一行填写从 0 到 K 的每个 k 值,然后一直工作到第 i = 0 行,确保您有较小子问题的答案当你需要它们时。

def subsetSum(A: list[int], K: int) -> bool:
  N = len(A)
  DP = [[None] * (K+1) for x in range(N+1)]
  DP[N] = [True if x == 0 else False for x in range(K+1)]

  for i in range(N-1, -1, -1):
    Ai = A[i]
    DP[i] = [DP[i+1][k] or (Ai <=k and DP[i+1][k-Ai]) for k in range(0, K+1)]

  # print result
  print(f"A = {A}, K = {K}")
  print('Ai,k:', *range(0,K+1), sep='\t')
  for (i, row) in enumerate(DP): print(A[i] if i < N else None, *row, sep='\t')
  print(f"DP[0][K] = {DP[0][K]}")
  return DP[0][K]

subsetSum([1,4,3,5,6], 9)

If you want to return an actual possible subset alongside the bool indicating whether or not it's possible to make one, then for every True flag in your DP you should also store the k index for the previous row that got you there (it will either be the current k index or kA[i], depending on which table lookup returned True, which will indicate whether or not A[i] was used).如果您想在 bool 旁边返回一个实际可能的子集,指示是否可以创建一个子集,那么对于 DP 中的每个 True 标志,您还应该存储使您到达那里的前一行的 k 索引(它要么是当前 k 索引或 kA[i],取决于哪个表查找返回 True,这将指示是否使用了 A[i])。 Then you walk backwards from DP[0][K] after the table is filled to get a subset.然后在表填满后从 DP[0][K] 向后走,得到一个子集。 This makes the code messier but it's definitely do-able.这使得代码更加混乱,但它绝对是可行的。 You can't get all subsets this way though (at least not without increasing your time complexity again) because the DP table compresses information.但是,您无法通过这种方式获得所有子集(至少不会再次增加时间复杂度),因为 DP 表会压缩信息。

Here is the optimized solution to the problem with a complexity of O(n^2).这是复杂度为 O(n^2) 的问题的优化解决方案。

def get_subsets(data: list, target: int):
# initialize final result which is a list of all subsets summing up to target
subsets = []

# records the difference between the target value and a group of numbers
differences = {}

for number in data:
    prospects = []

    # iterate through every record in differences
    for diff in differences:

        # the number complements a record in differences, i.e. a desired subset is found
        if number - diff == 0:
            new_subset = [number] + differences[diff]
            new_subset.sort()
            if new_subset not in subsets:
                subsets.append(new_subset)

        # the number fell short to reach the target; add to prospect instead
        elif number - diff < 0:
            prospects.append((number, diff))

    # update the differences record
    for prospect in prospects:
        new_diff = target - sum(differences[prospect[1]]) - prospect[0]
        differences[new_diff] = differences[prospect[1]] + [prospect[0]]
    differences[target - number] = [number]

return subsets

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM