Python dynamic programming performance difference

Question

I'm studying dynamic programming by doing Leetcode problems, and I frequently face time limit exceeded errors even though I'm caching my results. Can anyone explain why my version is so much slower than the official version for this problem?

There are obviously differences in the code, eg, I use a class function for recursion while the official answer does not. My recursive function returns numeric values, the official one does not, etc. None of these seem like meaningful differences though, but the performance difference is nonetheless dramatic.

My version. This takes 0.177669 seconds to run, and receives a time limit exceeded error.

import datetime as dt
from typing import List
from functools import lru_cache


class Solution:
    def canPartition(self, nums: List[int]) -> bool:
        self.nums = nums
        total = sum(self.nums)
        if total % 2 == 1:
            return False
        half_total = total // 2
        return self.traverse(half_total, 0) == 0

    @lru_cache(maxsize=None)
    def traverse(self, subset_sum, index):
        if subset_sum < 0:
            return float('inf')
        elif index == len(self.nums):
            return subset_sum
        else:
            include = self.traverse(subset_sum - self.nums[index], index + 1)
            exclude = self.traverse(subset_sum, index + 1)
            best = min(include, exclude)
            return best


test_case = [20,68,68,11,48,18,50,5,3,51,52,11,13,11,38,100,30,87,1,56,85,63,14,96,7,17,54,11,32,61,94,13,85,10,78,57,69,92,66,28,70,20,3,29,10,73,89,86,28,48,69,54,87,11,91,32,59,4,88,20,81,100,29,75,79,82,6,74,66,30,9,6,83,54,54,53,80,94,64,77,22,7,22,26,12,31,23,26,65,65,35,36,34,1,12,44,22,73,59,99]
solution = Solution()
start = dt.datetime.now()
print(solution.canPartition(test_case))
end = dt.datetime.now()
print((end-start).total_seconds())

This is the official answer. It takes only 0.000165 seconds!

import datetime as dt
from typing import List, Tuple
from functools import lru_cache


class Solution:
    def canPartition(self, nums: List[int]) -> bool:
        @lru_cache(maxsize=None)
        def dfs(nums: Tuple[int], n: int, subset_sum: int) -> bool:
            # Base cases
            if subset_sum == 0:
                return True
            if n == 0 or subset_sum < 0:
                return False
            result = (dfs(nums, n - 1, subset_sum - nums[n - 1])
                    or dfs(nums, n - 1, subset_sum))
            return result

        # find sum of array elements
        total_sum = sum(nums)

        # if total_sum is odd, it cannot be partitioned into equal sum subsets
        if total_sum % 2 != 0:
            return False

        subset_sum = total_sum // 2
        n = len(nums)
        return dfs(tuple(nums), n - 1, subset_sum)


test_case = [20,68,68,11,48,18,50,5,3,51,52,11,13,11,38,100,30,87,1,56,85,63,14,96,7,17,54,11,32,61,94,13,85,10,78,57,69,92,66,28,70,20,3,29,10,73,89,86,28,48,69,54,87,11,91,32,59,4,88,20,81,100,29,75,79,82,6,74,66,30,9,6,83,54,54,53,80,94,64,77,22,7,22,26,12,31,23,26,65,65,35,36,34,1,12,44,22,73,59,99]
solution = Solution()
start = dt.datetime.now()
print(solution.canPartition(test_case))
end = dt.datetime.now()
print((end-start).total_seconds())

Answer 1

It you want to know about performance, you need to profile your code. Profiling lets you see where your code spends its time.

CPython comes with built-in profiling module called cProfile . But you might want to look at eg line_profiler .

Answer 2

In the former version, all possible cases are searched. While in the latter, the algorithm stops when a feasible solution has been found.

In the first version:

include = self.traverse(subset_sum - self.nums[index], index + 1)
# Suppose {include} is zero, the answer is already obtained, 
# but the algorithm still try to compute {exclude}, which is not neccessary.
exclude = self.traverse(subset_sum, index + 1)

In the second version:

result = (dfs(nums, n - 1, subset_sum - nums[n - 1])
                    or dfs(nums, n - 1, subset_sum))
# Because of the short-circuit behavior of logical operator,
# if the first branch has already obtained the solution, 
# the second branch will not be executed.

Just adding a if-check will improve the performance:

include = self.traverse(subset_sum - self.nums[index], index + 1)
# Check whether we are already done:
if include == 0:
    return include
exclude = self.traverse(subset_sum, index + 1)

Answer 3

your function returns a number that alters between a float and an int during the execution. python has to cope with it during the whole execution. whereas you need to return an answer of "yes or no" to the question "can it be partitioned?" and simple boolean Ture/False is enough and suggested.
for the same reason above, you are using a comparison function min on two recursively gained result where you have to run two recursion to their deepest levels. by using booleans, you can shortcut this process and this other program uses that shortcut.

Python dynamic programming performance difference

Question

3 answers

solution1
0 2022-01-16 15:48:42

solution2
0 2022-01-16 15:52:41

solution3
0 2022-01-16 16:33:28

Python dynamic programming performance difference

Question

3 answers

solution1 0 2022-01-16 15:48:42

solution2 0 2022-01-16 15:52:41

solution3 0 2022-01-16 16:33:28

solution1
0 2022-01-16 15:48:42

solution2
0 2022-01-16 15:52:41

solution3
0 2022-01-16 16:33:28