简体   繁体   English

查找以最大和为相同数字开始和结束的子序列

[英]Finding the subsequence that starts and ends with the same number with the maximum sum

I have to make a program that takes as input a list of numbers and returns the sum of the subsequence that starts and ends with the same number which has the maximum sum (including the equal numbers in the beginning and end of the subsequence in the sum). 我必须制作一个程序,将数字列表作为输入,并返回以最大和相同的数字开始和结束的子序列的总和(包括总和中子序列的相等数) )。 It also has to return the placement of the start and end of the subsequence, that is, their index+1. 它还必须返回子序列开始和结束的位置,即其索引+1。 The problem is that my current code runs smoothly only while the length of list is not that long. 问题是我的当前代码仅在list的长度不那么长时才能平稳运行。 When the list length extends to 5000 the program does not give an answer. 当列表长度扩展到5000时,程序不会给出答案。

The input is the following: 输入如下:

6
3 2 4 3 5 6

The first line is for the length of the list. 第一行是列表的长度。 The second line is the list itself, with list items separated by space. 第二行是列表本身,列表项之间用空格分隔。 The output will be 12, 1, 4 , because as you can see there is 1 equal pair of numbers (3): the first and fourth element, so the sum of elements between them is 3 + 2 + 4 + 3 = 12, and their placement is first and fourth. 输出为12, 1, 4因为如您所见,有一对相等的数字对(3):第一个和第四个元素,因此它们之间的元素之和为3 + 2 + 4 + 3 = 12他们的位置是第一和第四。

Here is my code. 这是我的代码。

length = int(input())
mass = raw_input().split()
for i in range(length):
    mass[i]=int(mass[i])
value=-10000000000

b = 1
e = 1
for i in range(len(mass)):
    if mass[i:].count(mass[i])!=1:
      for j in range(i,len(mass)):
        if mass[j]==mass[i]:
            f = mass[i:j+1]
            if sum(f)>value:
                value = sum(f)
                b = i+1
                e = j+1
    else:
        if mass[i]>value:
            value = mass[i]
            b = i+1
            e = i+1
print value
print b,e

This should be faster than your current approach. 这应该比您当前的方法快。

Rather than searching through mass looking for pairs of matching numbers we pair each number in mass with its index and sort those pairs. 而不是通过搜索mass找对匹配的数字,我们在对中的每个号码的mass与它的索引和排序那些对。 We can then use groupby to find groups of equal numbers. 然后,我们可以使用groupby查找相等数目的组。 If there are more than 2 of the same number we use the first and last, since they will have the greatest sum between them. 如果相同数字中的2个以上,则我们使用第一个和最后一个,因为它们之间的和最大。

from operator import itemgetter
from itertools import groupby

raw = '3 5 6 3 5 4'
mass = [int(u) for u in raw.split()]

result = []
a = sorted((u, i) for i, u in enumerate(mass))
for _, g in groupby(a, itemgetter(0)):
    g = list(g)
    if len(g) > 1:
        u, v = g[0][1], g[-1][1]
        result.append((sum(mass[u:v+1]), u+1, v+1))
print(max(result))

output 输出

(19, 2, 5)

Note that this code will not necessarily give the maximum sum between equal elements in the list if the list contains negative numbers. 请注意,这个代码不一定会给予相等的元素之间的最大金额列表如果列表中包含负数。 It will still work correctly with negative numbers if no group of equal numbers has more than two members. 如果没有相等的组具有两个以上的成员,它将仍然可以正确地使用负数。 If that's not the case, we need to use a slower algorithm that tests every pair within a group of equal numbers. 如果不是这种情况,我们需要使用较慢的算法来测试一组相等数字中的每一对。


Here's a more efficient version. 这是一个更有效的版本。 Instead of using the sum function we build a list of the cumulative sums of the whole list. 代替使用sum函数,我们构建整个列表的累积和的列表。 This doesn't make much of a difference for small lists, but it's much faster when the list size is large. 对于较小的列表,这并没有太大的区别,但是当列表很大时,它更快。 Eg, for a list of 10,000 elements this approach is about 10 times faster. 例如,对于10,000个元素的列表,此方法快约10倍。 To test it, I create an array of random positive integers. 为了测试它,我创建了一个随机正整数数组。

from operator import itemgetter
from itertools import groupby
from random import seed, randrange

seed(42)

def maxsum(seq):
    total = 0
    sums = [0]
    for u in seq:
        total += u
        sums.append(total)
    result = []
    a = sorted((u, i) for i, u in enumerate(seq))
    for _, g in groupby(a, itemgetter(0)):
        g = list(g)
        if len(g) > 1:
            u, v = g[0][1], g[-1][1]
            result.append((sums[v+1] - sums[u], u+1, v+1))
    return max(result)

num = 25000
hi = num // 2
mass = [randrange(1, hi) for _ in range(num)]
print(maxsum(mass))       

output 输出

(155821402, 21, 24831)

If you're using a recent version of Python you can use itertools.accumulate to build the list of cumulative sums. 如果您使用的是最新版本的Python,则可以使用itertools.accumulate来构建累积总和列表。 This is around 10% faster. 这快了大约10%。

from itertools import accumulate

def maxsum(seq):
    sums = [0] + list(accumulate(seq))
    result = []
    a = sorted((u, i) for i, u in enumerate(seq))
    for _, g in groupby(a, itemgetter(0)):
        g = list(g)
        if len(g) > 1:
            u, v = g[0][1], g[-1][1]
            result.append((sums[v+1] - sums[u], u+1, v+1))
    return max(result)

Here's a faster version, derived from code by Stefan Pochmann, which uses a dict, instead of sorting & groupby . 这是一个更快的版本,它是由Stefan Pochmann的代码派生的,它使用dict而不是sort& groupby Thanks, Stefan! 谢谢,斯蒂芬!

def maxsum(seq):
    total = 0
    sums = [0]
    for u in seq:
        total += u
        sums.append(total)
    where = {}
    for i, x in enumerate(seq, 1):
        where.setdefault(x, [i, i])[1] = i
    return max((sums[j] - sums[i - 1], i, j)
        for i, j in where.values())

If the list contains no duplicate items (and hence no subsequences bound by duplicate items) it returns the maximum item in the list. 如果列表不包含重复项(因此不包含重复项绑定的子序列),则返回列表中的最大项。


Here are two more variations. 这是另外两个变体。 These can handle negative items correctly, and if there are no duplicate items they return None . 它们可以正确处理否定项,如果没有重复项,则它们返回None In Python 3 that could be handled elegantly by passing default=None to max , but that option isn't available in Python 2, so instead I catch the ValueError exception that's raised when you attempt to find the max of an empty iterable. 在Python 3中,可以通过将default=None传递给max来优雅地处理,但是该选项在Python 2中不可用,所以我捕获了当您尝试查找空的Iterable的max时引发的ValueError异常。

The first version, maxsum_combo , uses itertools.combinations to generate all combinations of a group of equal numbers and thence finds the combination that gives the maximum sum. 第一个版本maxsum_combo使用itertools.combinations生成一组相等数字的所有组合,然后找到给出最大和的组合。 The second version, maxsum_kadane uses a variation of Kadane's algorithm to find the maximum subsequence within a group. 第二个版本maxsum_kadane使用Kadane算法的变体来查找组中的最大子序列。

If there aren't many duplicates in the original sequence, so the average group size is small, maxsum_combo is generally faster. 如果原始序列中没有很多重复项,那么平均组大小很小,因此maxsum_combo通常更快。 But if the groups are large, then maxsum_kadane is much faster than maxsum_combo . 但是,如果群体很大,那么maxsum_kadane是速度远远超过maxsum_combo The code below tests these functions on random sequences of 15000 items, firstly on sequences with few duplicates (and hence small mean group size) and then on sequences with lots of duplicates. 下面的代码在15000个项目的随机序列上测试这些功能,首先是在重复项很少(因此平均组大小较小)的序列上,然后在重复项很多的序列上。 It verifies that both versions give the same results, and then it performs timeit tests. 这验证了两个版本给出了相同的结果,然后将其进行timeit测试。

from __future__ import print_function
from itertools import groupby, combinations
from random import seed, randrange
from timeit import Timer

seed(42)

def maxsum_combo(seq):
    total = 0
    sums = [0]
    for u in seq:
        total += u
        sums.append(total)
    where = {}
    for i, x in enumerate(seq, 1):
        where.setdefault(x, []).append(i)
    try:
        return max((sums[j] - sums[i - 1], i, j)
            for v in where.values() for i, j in combinations(v, 2))
    except ValueError:
        return None

def maxsum_kadane(seq):
    total = 0
    sums = [0]
    for u in seq:
        total += u
        sums.append(total)
    where = {}
    for i, x in enumerate(seq, 1):
        where.setdefault(x, []).append(i)
    try:
        return max(max_sublist([(sums[j] - sums[i-1], i, j)
            for i, j in zip(v, v[1:])], k) 
                for k, v in where.items() if len(v) > 1)
    except ValueError:
        return None

# Kadane's Algorithm to find maximum sublist
# From https://en.wikipedia.org/wiki/Maximum_subarray_problem
def max_sublist(seq, k):
    max_ending_here = max_so_far = seq[0]
    for x in seq[1:]:
        y = max_ending_here[0] + x[0] - k, max_ending_here[1], x[2]
        max_ending_here = max(x, y)
        max_so_far = max(max_so_far, max_ending_here)
    return max_so_far

def test(num, hi, loops):
    print('\nnum = {0}, hi = {1}, loops = {2}'.format(num, hi, loops))
    print('Verifying...')
    for k in range(5):
        mass = [randrange(-hi // 2, hi) for _ in range(num)]
        a = maxsum_combo(mass)
        b = maxsum_kadane(mass)
        print(a, b, a==b)

    print('\nTiming...')
    for func in maxsum_combo, maxsum_kadane:
        t = Timer(lambda: func(mass))
        result = sorted(t.repeat(3, loops))
        result = ', '.join([format(u, '.5f') for u in result])
        print('{0:14} : {1}'.format(func.__name__, result))

loops = 20
num = 15000
hi = num // 4
test(num, hi, loops)

loops = 10
hi = num // 100
test(num, hi, loops)

output 输出

num = 15000, hi = 3750, loops = 20
Verifying...
(13983131, 44, 14940) (13983131, 44, 14940) True
(13928837, 27, 14985) (13928837, 27, 14985) True
(14057416, 40, 14995) (14057416, 40, 14995) True
(13997395, 65, 14996) (13997395, 65, 14996) True
(14050007, 12, 14972) (14050007, 12, 14972) True

Timing...
maxsum_combo   : 1.72903, 1.73780, 1.81138
maxsum_kadane  : 2.17738, 2.22108, 2.22394

num = 15000, hi = 150, loops = 10
Verifying...
(553789, 21, 14996) (553789, 21, 14996) True
(550174, 1, 14992) (550174, 1, 14992) True
(551017, 13, 14991) (551017, 13, 14991) True
(554317, 2, 14986) (554317, 2, 14986) True
(558663, 15, 14988) (558663, 15, 14988) True

Timing...
maxsum_combo   : 7.29226, 7.34213, 7.36688
maxsum_kadane  : 1.07532, 1.07695, 1.10525

This code runs on both Python 2 and Python 3. The above results were generated on an old 32 bit 2GHz machine running Python 2.6.6 on a Debian derivative of Linux. 该代码可同时在Python 2和Python 3上运行。以上结果是在Linux的Debian衍生版上运行Python 2.6.6的旧32位2GHz计算机上生成的。 The speeds for Python 3.6.0 are similar. Python 3.6.0的速度相似。


If you want to include groups that consist of a single non-repeated number, and also want to include the numbers that are in groups as a "subsequence" of length 1, you can use this version: 如果要包括由单个非重复数字组成的组,并且还希望将组中的数字作为长度为1的“子序列”包括在内, 可以使用以下版本:

def maxsum_kadane(seq):
    if not seq:
        return None
    total = 0
    sums = [0]
    for u in seq:
        total += u
        sums.append(total)
    where = {}
    for i, x in enumerate(seq, 1):
        where.setdefault(x, []).append(i)
    # Find the maximum of the single items
    m_single = max((k, v[0], v[0]) for k, v in where.items())
    # Find the maximum of the subsequences
    try:
        m_subseq = max(max_sublist([(sums[j] - sums[i-1], i, j)
                for i, j in zip(v, v[1:])], k) 
                    for k, v in where.items() if len(v) > 1)
        return max(m_single, m_subseq)
    except ValueError:
        # No subsequences
        return m_single

I haven't tested it extensively, but it should work. 我没有对其进行广泛的测试,但是它应该可以工作。 ;) ;)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM