简体   繁体   English

2个列表的最大路径总和

[英]Maximum path sum of 2 lists

My question is about this kata on Codewars.我的问题是关于 Codewars 上的这个kata。 The function takes two sorted lists with distinct elements as arguments. function 采用两个具有不同元素的排序列表作为 arguments。 These lists might or might not have common items.这些列表可能有也可能没有共同项目。 The task is find the maximum path sum.任务是找到最大路径和。 While finding the sum, if there any common items you can choose to change your path to the other list.在查找总和时,如果有任何常见项目,您可以选择将路径更改为其他列表。

The given example is like this:给定的例子是这样的:

list1 = [0, 2, 3, 7, 10, 12]
list2 = [1, 5, 7, 8]
0->2->3->7->10->12 => 34
0->2->3->7->8      => 20
1->5->7->8         => 21
1->5->7->10->12    => 35 (maximum path)

I solved the kata but my code doesn't match the performance criteria so I get execution timed out.我解决了 kata,但我的代码与性能标准不匹配,所以我执行超时。 What can I do for it?我能为它做些什么?

Here is my solution:这是我的解决方案:

def max_sum_path(l1:list, l2:list):
    common_items = list(set(l1).intersection(l2))
    if not common_items:
        return max(sum(l1), sum(l2))
    common_items.sort()
    s = 0
    new_start1 = 0
    new_start2 = 0
    s1 = 0
    s2 = 0
    for item in common_items:
        s1 = sum(itertools.islice(l1, new_start1, l1.index(item)))
        s2 = sum(itertools.islice(l2, new_start2, l2.index(item)))
        new_start1 = l1.index(item)
        new_start2 = l2.index(item)
        s += max(s1, s2)
    s1 = sum(itertools.islice(l1, new_start1, len(l1)))
    s2 = sum(itertools.islice(l2, new_start2, len(l2)))
    s += max(s1, s2)
    return s

This can be done in a single pass in O(n) runtime and O(1) space complexity.这可以在O(n)运行时间和O(1)空间复杂度的单次通过中完成。 All you need is two pointers to traverse both arrays in parallel and two path values.您只需要两个指针来并行遍历 arrays 和两个路径值。

You increment the pointer to the smaller element and add its value to its path.您增加指向较小元素的指针并将其值添加到其路径。 When you find a common element, you add it to both paths and then set both paths to the max value.当您找到一个公共元素时,将其添加到两个路径中,然后将两个路径设置为最大值。

def max_sum_path(l1, l2):
    path1 = 0
    path2 = 0
    i = 0
    j = 0
    while i < len(l1) and j < len(l2):
        if l1[i] < l2[j]:
            path1 += l1[i]
            i += 1
        elif l2[j] < l1[i]:
            path2 += l2[j]
            j += 1
        else:
            # Same element in both paths
            path1 += l1[i]
            path2 += l1[i]
            path1 = max(path1, path2)
            path2 = path1
            i += 1
            j += 1
    while i < len(l1):
        path1 += l1[i]
        i += 1
    while j < len(l2):
        path2 += l2[j]
        j += 1
    return max(path1, path2)

The problem says "aim for linear time complexity", which is a pretty big hint that things like nested loops won't fly (yes, sum and slice are O(n) loops here and sort() is O(n log(n))).问题是“以线性时间复杂度为目标”,这是一个很大的暗示,即嵌套循环之类的东西不会飞(是的, sumslice在这里是 O(n) 循环,而sort()是 O(n log(n )))。

I started by formulating the problem as a directed acyclic graph with the idea of searching for the maximum path sum:我首先将问题表述为有向无环图,并考虑搜索最大路径和:

       +---> [0, 2, 3] ---+            +---> [10, 12]
[0] ---|                  |---> [7] ---|
       +---> [1, 5] ------+            +---> [8]

We might as well also sum the values of each node for clarity:为了清楚起见,我们也可以对每个节点的值求和:

     +---> 5 ---+          +---> 22
0 ---|          |---> 7 ---|
     +---> 6 ---+          +---> 8

The diagram above reveals that a greedy solution will be optimal, given the uniqueness constraints.上图显示,给定唯一性约束,贪心解决方案将是最优的。 For example, starting from the root, we can only pick the 5 or 6 value path to get to 7. The larger of the two, 6, is guaranteed to be part of the maximum-weight path, so we take it.例如,从根开始,我们只能选择 5 或 6 值的路径来得到 7。两者中较大的 6 保证是最大权重路径的一部分,所以我们取它。

Now, the question is only how to implement this logic.现在,问题只是如何实现这个逻辑。 Going back to the lists, here's a more substantial input with formatting and annotations to help motivate an approach:回到列表,这里有一个更重要的输入格式和注释,以帮助激发一种方法:

[1, 2, 4, 7, 8,    10,         14, 15    ]
[      4,    8, 9,     11, 12,     15, 90]
       ^     ^                      ^
       |     |                      |

This illustrates how the linked indices line up.这说明了链接索引是如何排列的。 Our goal is to iterate over each chunk between the links, taking the larger of the two sublist sums:我们的目标是遍历链接之间的每个块,取两个子列表和中较大的一个:

[1, 2, 4, 7, 8,    10,         14, 15    ]
[      4,    8, 9,     11, 12,     15, 90]
 ^~~^     ^     ^~~~~~~~~~~~~~~~^      ^^
  0       1             2               3  <-- chunk number

The expected result for the above input should be 3 + 4 + 7 + 8 + 32 + 15 + 90 = 159, taking all of the link values plus the top list's sublist sum for chunks 0 and 1 and the bottom list for chunks 2 and 3.上述输入的预期结果应该是 3 + 4 + 7 + 8 + 32 + 15 + 90 = 159,取所有链接值加上块 0 和 1 的顶部列表的子列表总和以及块 2 和的底部列表3.

Here's a rather verbose, but hopefully easy to understand, implementation;这是一个相当冗长但希望易于理解的实现; you can visit the thread to see more elegant solutions:您可以访问线程以查看更优雅的解决方案:

def max_sum_path(a, b):
    b_idxes = {k: i for i, k in enumerate(b)}
    link_to_a = {}
    link_to_b = {}
    
    for i, e in enumerate(a):
        if e in b_idxes:
            link_to_a[e] = i
            link_to_b[e] = b_idxes[e]
    
    total = 0
    start_a = 0
    start_b = 0
    
    for link in link_to_a: # dicts assumed sorted, Python 3.6+
        end_a = link_to_a[link]
        end_b = link_to_b[link]
        total += max(sum(a[start_a:end_a]), sum(b[start_b:end_b])) + link
        start_a = end_a + 1
        start_b = end_b + 1
        
    return total + max(sum(a[start_a:]), sum(b[start_b:]))

Try using Map and Lambda instead of the For Loop map() works way faster than for loop.尝试使用 Map 和 Lambda 代替 For 循环 map() 的工作方式比 for 循环快。

Once you know the items shared between the two lists, you can iterate over each list separately to sum up the items in between the shared items, thus constructing a list of partial sums.一旦知道了两个列表之间共享的项目,您就可以分别遍历每个列表以总结共享项目之间的项目,从而构建一个部分和列表。 These lists will have the same length for both input lists, because the number of shared items is the same.这些列表对于两个输入列表将具有相同的长度,因为共享项目的数量是相同的。

The maximum path sum can then be found by taking the maximum between the two lists for each stretch between shared values:然后可以通过对共享值之间的每个拉伸取两个列表之间的最大值来找到最大路径总和:

def max_sum_path(l1, l2):
    shared_items = set(l1) & set(l2)
    
    def partial_sums(lst):
        result = []
        partial_sum = 0
        for item in lst:
            partial_sum += item
            if item in shared_items:
                result.append(partial_sum)
                partial_sum = 0
        result.append(partial_sum)
        return result
    
    partial_sums1 = partial_sums(l1)
    partial_sums2 = partial_sums(l2)
            
    return sum(max(sum1, sum2) for sum1, sum2 in 
               zip(partial_sums1, partial_sums2))

Time complexity: We only iterate once over each list (the iteration over the shorter lists of partial sums is irrelevant here), so this code is linear in the length of the input lists, ie pretty fast.时间复杂度:我们只在每个列表上迭代一次(对较短的部分和列表的迭代在这里无关紧要),所以这段代码在输入列表的长度上是线性的,即相当快。

Your algorithm is actually fast, just your implementation is slow.你的算法实际上很快,只是你的实现很慢。

The two things that make it take overall quadratic time:使它需要整体二次时间的两件事:

  • l1.index(item) always searches from the start of the list. l1.index(item)总是从列表的开头搜索。 Should be l1.index(item, new_start1) .应该是l1.index(item, new_start1)
  • itertools.islice(l1, new_start1, ...) creates an iterator for l1 and iterates over the first new_start1 elements before it reaches the elements you want. itertools.islice(l1, new_start1, ...)l1创建一个迭代器,并在第一个new_start1元素到达您想要的元素之前对其进行迭代。 No simple local fix for this one.这个没有简单的本地修复。

You could use iterators instead of indexes.您可以使用迭代器而不是索引。 Here's your solution rewritten to do that, got accepted in under six seconds:这是您为此重写的解决方案,在六秒内被接受:

def max_sum_path(l1:list, l2:list):
    common_items = sorted(set(l1) & set(l2))
    s = 0
    it1 = iter(l1)
    it2 = iter(l2)
    for item in common_items:
        s1 = sum(iter(it1.__next__, item))
        s2 = sum(iter(it2.__next__, item))
        s += max(s1, s2) + item
    s1 = sum(it1)
    s2 = sum(it2)
    s += max(s1, s2)
    return s

I'd combine the last four lines into one, just left it like you had so it's easier to compare.我会将最后四行合二为一,就像你一样,这样比较容易。

Benchmarks基准

On the Discourse tab you can click "Show Kata Test Cases" (once you solved the kata) to see their test case generator.Discourse选项卡上,您可以单击“Show Kata Test Cases”(一旦您解决了 kata)来查看他们的测试用例生成器。 I used that to benchmark the solutions posted so far as well as one from me.我用它来对迄今为止发布的解决方案以及我的一个解决方案进行基准测试。 A few dozen rounds, since the test cases are pretty random, causing big runtime fluctuation.几十轮,因为测试用例非常随机,导致运行时波动很大。 In each round, all test cases generated were given to all solutions (so in each round, all solutions got the same test cases).在每一轮中,生成的所有测试用例都提供给所有解决方案(因此在每一轮中,所有解决方案都得到相同的测试用例)。

来自 Codewars 的测试用例

And also Kelly Bundy's worst case for sorting the set of common values:还有凯利邦迪对一组共同值进行排序的最坏情况:

在此处输入图像描述

Code shall follow.应遵循代码。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM