[英]Trying to understand the time complexity of this dynamic recursive subset sum
# Returns true if there exists a subsequence of `A[0…n]` with the given sum
def subsetSum(A, n, k, lookup):
# return true if the sum becomes 0 (subset found)
if k == 0:
return True
# base case: no items left, or sum becomes negative
if n < 0 or k < 0:
return False
# construct a unique key from dynamic elements of the input
key = (n, k)
# if the subproblem is seen for the first time, solve it and
# store its result in a dictionary
if key not in lookup:
# Case 1. Include the current item `A[n]` in the subset and recur
# for the remaining items `n-1` with the decreased total `k-A[n]`
include = subsetSum(A, n - 1, k - A[n], lookup)
# Case 2. Exclude the current item `A[n]` from the subset and recur for
# the remaining items `n-1`
exclude = subsetSum(A, n - 1, k, lookup)
# assign true if we get subset by including or excluding the current item
lookup[key] = include or exclude
# return solution to the current subproblem
return lookup[key]
if __name__ == '__main__':
# Input: a set of items and a sum
A = [7, 3, 2, 5, 8]
k = 14
# create a dictionary to store solutions to subproblems
lookup = {}
if subsetSum(A, len(A) - 1, k, lookup):
print('Subsequence with the given sum exists')
else:
print('Subsequence with the given sum does not exist')
It is said that the complexity of this algorithm is O(n * sum), but I can't understand how or why;据说这个算法的复杂度是O(n * sum),但是我不明白如何或为什么; can someone help me?
有人能帮我吗? Could be a wordy explanation or a recurrence relation, anything is fine
可能是冗长的解释或递归关系,什么都可以
The simplest explanation I can give is to realize that when lookup[(n, k)]
has a value, it is True or False and indicates whether some subset of A[:n+1]
sums to k
.我能给出的最简单的解释是意识到当
lookup[(n, k)]
有一个值时,它是 True 或 False 并指示A[:n+1]
的某个子集是否与k
相加。
Imagine a naive algorithm that just fills in all the elements of lookup row by row.想象一个简单的算法,它只是逐行填充查找的所有元素。
lookup[(0, i)]
(for 0 ≤ i
≤ total
) has just two elements true, i = A[0]
and i = 0
, and all the other elements are false. lookup[(0, i)]
(对于 0 ≤ i
≤ total
)只有两个元素为真, i = A[0]
和i = 0
,所有其他元素都是假的。
lookup[(1, i)]
(for 0 ≤ i
≤ total
) is true if lookup[(0, i)]
is true or i ≥ A[1]
and lookup[(0, i - A[1])
is true.如果
lookup[(0, i)]
为真或i ≥ A[1]
且lookup[(0, i - A[1])
为真,则lookup lookup[(1, i)]
(对于0 ≤ i
≤ total
)为真真的。 I can reach the sum i
either by using A[i]
or not, and I've already calculated both of those.我可以通过使用
A[i]
或不使用来达到总和i
,而且我已经计算了这两个。
... lookup[(r, i)]
(for 0 ≤ i
≤ total
) is true if lookup[(r - 1, i)]
is true or i ≥ A[r]
and lookup[(r - 1, i - A[r])
is true. ... 如果
lookup[(r, i)]
lookup[(r - 1, i)]
为真或i
total
i ≥ A[r]
并且lookup[(r - 1, i - A[r])
为真。
Filling in this table this way, it is clear that we can completely fill the lookup table for rows 0 ≤ row < len(A)
in time len(A) * total
since filling in each element in linear.以这种方式填充这个表,很明显我们可以在
len(A) * total
的时间内完全填充行0 ≤ row < len(A)
的查找表,因为线性填充每个元素。 And our final answer is just checking if (len(A) - 1, sum)
True in the table.我们的最终答案只是检查表中是否
(len(A) - 1, sum)
True。
Your program is doing the exact same thing, but calculating the value of entries of lookup
as they are needed.您的程序正在做完全相同的事情,但会根据需要计算
lookup
条目的值。
Sorry for submitting two answers.抱歉提交了两个答案。 I think I came up with a slightly simpler explanation.
我想我想出了一个稍微简单的解释。
Take your code in imagine putting the three lines inside if key not in lookup:
into a separate function, calculateLookup(A, n, k, lookup)
.想象一下您的代码将三行放入
if key not in lookup:
到单独的 function, calculateLookup(A, n, k, lookup)
中。 I'm going to call "the cost of calling calculateLookup
for n
and k
for a specific value of n
and k
to be the total time spent in the call to calculateLookup(A, n, k, loopup)
, but excluding any recursive calls to calculateLookup
.我将调用“为 n 和 k 调用
n
和k
的特定值n
和k
的成本是调用calculateLookup
calculateLookup(A, n, k, loopup)
所花费的总时间,但不包括任何递归调用calculateLookup
。
The key insight is that as defined above, the cost of calling calculateLookup()
for any n
and k
is O(1).关键的见解是,如上所述,对任何
n
和k
调用calculateLookup()
的成本是 O(1)。 Since we are excluding recursive calls in the cost, and there are no for loops, the cost of calculateLookup
is the cost of just executing a few tests.由于我们在成本中排除了递归调用,并且没有 for 循环,因此
calculateLookup
的成本是仅执行几个测试的成本。
The entire algorithm does a fixed amount of work, calls calculateLookup
, and then a small amount of work.整个算法做固定量的工作,调用
calculateLookup
,然后做少量的工作。 Hence the amount of time spent in our code is the same as asking how many times do we call calculateLookup
?因此,在我们的代码中花费的时间与询问我们调用了多少次
calculateLookup
相同?
Now we're back to previous answer.现在我们回到之前的答案。 Because of the lookup table, every call to
calculateLookup
is called with a different value for (n, k)
.由于查找表,每次调用
calculateLookup
时都会使用不同的(n, k)
值。 We also know that we check the bounds of n
and k
before each call to calculateLookup
so 1 ≤ k ≤ sum
and 0 ≤ n ≤ len(A)
.我们还知道,我们在每次调用
calculateLookup
之前检查n
和k
的边界,因此1 ≤ k ≤ sum
和0 ≤ n ≤ len(A)
。 So calculateLookup
is called at most (len(A) * sum)
times.因此,
calculateLookup
最多被调用(len(A) * sum)
次。
In general, for these algorithms that use memoization/cacheing, the easiest thing to do is to separately calculate and then sum:一般来说,对于这些使用 memoization/cacheing 的算法,最简单的做法是分别计算然后求和:
The algorithm you presented is just filling up the lookup
cache.您提出的算法只是填满了
lookup
缓存。 It's doing it in an unusual order, and its not filling every entry in the table, but that's all its doing.它以不寻常的顺序执行它,并且它没有填充表中的每个条目,但这就是它所做的全部。
The code would be slightly faster with代码会稍微快一点
lookup[key] = subsetSum(A, n - 1, k - A[n], lookup) or subsetSum(A, n - 1, k, lookup)
Doesn't change the O() of the code in the worst case, but can avoid some unnecessary calculations.在最坏的情况下不会改变代码的 O(),但可以避免一些不必要的计算。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.