简体   繁体   English

在线性时间内找到sum> = k的n个整数的最小子阵列

[英]Finding a minimal subarray of n integers of sum >= k in linear time

Recently I've been struggling with the following problem: 最近我一直在努力解决以下问题:

Given an array of integers, find a minimal (shortest length) subarray that sums to at least k. 给定一个整数数组,找到一个总和至少为k的最小(最短长度)子数组。

Obviously this can easily be done in O(n^2). 显然,这可以在O(n ^ 2)中轻松完成。 I was able to write an algorithm that solves it in linear time for natural numbers, but I can't figure it out for integers. 我能够编写一个算法,在自然数的线性时间内解决它,但我无法弄清楚它是否为整数。

My latest attempt was this: 我最近的尝试是这样的:

def find_minimal_length_subarr_z(arr, min_sum):
    found = False
    start = end = cur_end = cur_sum = 0
    for cur_start in range(len(arr)):
        if cur_end <= cur_start:
            cur_end, cur_sum = cur_start, arr[cur_start]
        else:
            cur_sum -= arr[cur_start-1]
        # Expand
        while cur_sum < min_sum and cur_end < len(arr)-1:
            cur_end += 1
            cur_sum += arr[cur_end]
        # Contract
        while cur_end > cur_start:
            new_sum = cur_sum - arr[cur_end]
            if new_sum >= min_sum or new_sum >= cur_sum:
                cur_end -= 1
                cur_sum = new_sum
            else:
                break
        if cur_sum >= min_sum and (not found or cur_end-cur_start < end-start):
            start, end, found = cur_start, cur_end, True
    if found:
        return start, end

For example: 例如:

[8, -7, 5, 5, 4], 12 => (2, 4)

However, it fails for: 但是,它失败了:

[-12, 2, 2, -12, 2, 0], 4

where the correct result is (1, 2) but the algorithm doesn't find it. 其中正确的结果是(1, 2)但算法找不到它。

Can this at all be done in linear time (with preferably constant space complexity)? 这可以在线性时间内完成(最好是空间复杂度恒定)吗?

Here's one that's linear time but also linear space. 这是一个线性时间,也是线性空间。 The extra space comes from a deque that could grow to linear size. 额外的空间来自可以增长到线性尺寸的双端队列。 (There's also a second array to maintain cumulative sums, but that could be removed pretty easily.) (还有第二个数组来维持累积总和,但这可以很容易地删除。)

from collections import deque
def find_minimal_length_subarr(arr, k):
   # assume k is positive
   sumBefore = [0]
   for x in arr: sumBefore.append(sumBefore[-1] + x)
   bestStart = -1
   bestEnd = len(arr)
   startPoints = deque()
   start = 0
   for end in range(len(arr)):
      totalToEnd = sumBefore[end+1]
      while startPoints and totalToEnd - sumBefore[startPoints[0]] >= k: # adjust start
         start = startPoints.popleft()
      if totalToEnd - sumBefore[start] >= k and end-start < bestEnd-bestStart:
         bestStart,bestEnd = start,end
      while startPoints and totalToEnd <= sumBefore[startPoints[-1]]: # remove bad candidates
         startPoints.pop()
      startPoints.append(end+1) # end+1 is a new candidate
   return (bestStart,bestEnd)

The deque holds a sequence of candidate start positions from left to right. 双端队列从左到右保持一系列候选起始位置。 The key invariant is that positions in the deque are also sorted by increasing value of "sumBefore". 关键不变量是双端队列中的位置也通过增加“sumBefore”的值来排序。

To see why, consider two positions x and y with x > y, and suppose sumBefore[x] <= sumBefore[y]. 为了理解原因,考虑两个位置x和y,其中x> y,并假设sumBefore [x] <= sumBefore [y]。 Then x is a strictly better start position than y (for segments ending at x or later), so we need never consider y again. 那么x是比y更严格的起始位置(对于以x或更晚结尾的段),所以我们不需要再考虑y。

FURTHER EXPLANATION: 进一步解释:

Imagine a naive algorithm that looked like this: 想象一个看起来像这样的天真算法:

for end in 0..N-1
   for start in 0..end
      check the segment from start to end

I'm try to improve the inner loop to only consider certain start points instead of all possible start points. 我试图改进内循环只考虑某些起点而不是所有可能的起点。 So when can we eliminate a particular start point from further consideration? 那么我们何时才能从进一步的考虑中消除一个特定的起点? In two situations. 在两种情况下。 Consider two start points S0 and S1 with S0 to the left of S1. 考虑两个起点S0和S1,其中S0位于S1的左侧。

First, we can eliminate S0 if we ever find that S1 begins an eligible segment (that is, a segment summing to at least k). 首先,如果我们发现S1开始符合条件的段(即,一个段总和至少为k),我们就可以消除S0。 That's what the first while loop does, where start is S0 and startPoints[0] is S1. 这就是第一个while循环所做的,其中start是S0,startPoints [0]是S1。 Even if we found some future eligible segment starting at S0, it would be longer than the segment we already found starting at S1. 即使我们从S0开始发现一些未来符合条件的段,它也会比我们从S1开始找到的段长。

Second, we can eliminate S0 if the sum of the elements from S0 to S1-1 is <= 0 (or, equivalently if the sum of the elements before S0 >= the sum of the elements before S1). 其次,如果从S0到S1-1的元素之和<= 0(或等效地,如果S0之前的元素之和> = S1之前的元素之和),则可以消除S0。 This is what the second while loop does, where S0 is startPoints[-1] and S1 is end+1. 这是第二个while循环所做的,其中S0是startPoints [-1]而S1是end + 1。 Trimming off the elements from S0 to S1-1 always makes sense (for end points at S1 or later), because it makes the segment shorter without decreasing its sum. 修剪从S0到S1-1的元素总是有意义的(对于S1或更高的终点),因为它会使段缩短而不减少其总和。

Actually, there's a third situation where we could eliminate S0: when the distance from S0 to end is greater than the length of the shortest segment found so far. 实际上,还有第三种情况我们可以消除S0:当从S0到end的距离大于到目前为止发现的最短段的长度。 I didn't implement this case because it wasn't needed. 我没有实施这个案例,因为不需要它。

Here you have a pseudo-code delivering the solution you are looking for. 在这里,您可以使用伪代码提供您正在寻找的解决方案。

curIndex = 0
while (curIndex <= endIndex)
{
    if(curSum == 0)
    {
        startIndex = curIndex
    }

    curSum = curSum + curVal
    curTot = curTot + 1
    if(curSum >= targetVal AND curTot < minTotSofar)
    { 
        maxSumSofar = curSum
        maxStartIndex = startIndex
        maxEndIndex = curIndex
        minTotSofar = curTot
        if(curTot == 1)
        {
            exit_loop
        }

        curSum = 0
        curTot = 0
        curIndex = startIndex   
    }
    else if(curIndex == endIndex)
    {
        if(maxSumSofar == 0 AND curSum >= targetValue)
        {
                maxSumSofar = curSum
                maxStartIndex = startIndex
                maxEndIndex = curIndex
                minTotSofar = curTot
         }
         else if(curSum < targetValue AND startIndex < endIndex)
         {
                curSum = 0
                curTot = 0
                curIndex = startIndex
         }
    }
    curIndex = curIndex + 1
}

------------ UPDATE AFTER JWPAT7 SUGGESTION ------------ JWPAT7建议后的更新

INPUTS: array of integers, indexed from 0 to endIndex . INPUTS:整数数组,索引从0到endIndex Target value (k) to compare with ( targetVal ). 与( targetVal )进行比较的目标值(k)。

OUTPUTS: final addition of the chosen subset ( maxSumSoFar ), start index of the subset ( maxStartIndex ), end index of the subset ( maxEndIndex ), total number of elements in the subset ( minTotSofar ). 输出:所选子集的最终添加( maxSumSoFar ),子集的起始索引( maxStartIndex ),子集的结束索引( maxEndIndex ),子集中的元素总数( minTotSofar )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM