简体   繁体   English

算法:具有奇数和偶数的数组

[英]Algorithm: array with odd and even numbers

Given an array with n elements, I want to calculate the largest range of the array in which there are just as many odd as even numbers. 给定一个具有n个元素的数组,我想计算数组的最大范围,其中奇数和偶数一样多。

Example

Input: 输入:

2 4 1 6 3 0 8 10 4 1 1 4 5 3 6

Expected output: 预期产量:

12

What I tried 我尝试了什么

I tried using the following steps: 我尝试使用以下步骤:

  • change all odd numbers to 1 and all even numbers to -1 将所有奇数更改为1,将所有偶数更改为-1
  • Inspect all possible sub-arrays, and for each calculate the sum of the 1 and -1 values. 检查所有可能的子数组,并为每个子数组计算1和-1值的总和。
  • Take the largest of these sub-arrays that has a sum of 0 取这些总和为0的最大子数组

But this has a time complexity of O(n^2) . 但这具有O(n ^ 2)的时间复杂度。

Question

How can I do this in O(n) ? 如何在O(n)中执行此操作?

Given : array a 给定 :数组a

Task : find the largest sub-array with an even amount of odd and even nubers 任务 :找到具有偶数个奇偶数的最大子阵列

Solution O(n) O(n)

  1. create a hash map m with the highest entry for each cumulative sum when replacing odd numbers with -1 and even numbers with 1, in java: 在Java中,将奇数替换为-1并将偶数替换为1时,为每个累加和创建一个哈希表m,该哈希图具有最高的累积总和:

     Map<Integer, Integer> m = new HashMap<>(); int sum = 0; for (int i = 0; i < a.length; i++) { sum += a[i] % 2 == 0 ? 1 : -1; m.put(sum, i); } 
  2. find the largest sub-array which sums up to 0 by looking for the biggest distance using m, in java: 在Java中使用m查找最大距离,从而找到最大为0的最大子数组:

     int bestStart = -1, bestEnd = -1; // indexes, so end inclusive sum = 0; for (int i = 0; i < a.length; i++) { Integer end = m.get(sum); sum += a[i] % 2 == 0 ? 1 : -1; if (end != null && end - i > bestEnd - bestStart) { bestStart = i; bestEnd = end; } } 

It's based on the observation that you can get the sum (after transforming the elements to 1 and -1) from x to y with cumSum[y] - cumSum[x - 1]. 基于观察,您可以使用cumSum [y]-cumSum [x-1]获得x和y的和(在将元素转换为1和-1之后)。 So if we want this to be 0 then they need to be the same (careful if x = 0 then cumSum = 0, cumSum[-1] is not defined). 因此,如果我们希望它为0,则它​​们必须相同(请注意,如果x = 0,则cumSum = 0,未定义cumSum [-1])。

This is a common dynamic programming problem. 这是一个常见的动态编程问题。 We can maintain suboptimal solutions by iterating through the list and at the same time updating optimal solution. 我们可以通过遍历列表并同时更新最佳解决方案来维护次优解决方案。 It is similar to finding a maximum element of array. 这类似于查找数组的最大元素。 Set the first element as maximum and update it in each iteration if it is needed. 将第一个元素设置为最大值,并在每次迭代中对其进行更新(如果需要)。 This problem just needs a bit more. 这个问题只需要一点。

We need 5 pointers (integers, and actually 5). 我们需要5个指针(整数,实际上是5个)。 Start pointer, end pointer and current pointer, maxend, maxstart. 开始指针,结束指针和当前指针maxend,maxstart。 Set the start and current pointer to the start of the array. startcurrent指针设置为数组的开始。 While the next coming elements obey the rules (alternating odd and even), increase the current pointer. 当下一个元素遵循规则时(交替使用奇数和偶数),请增加current指针。 Once they don't obey the rule, set the end pointer to the current pointer. 一旦他们不遵守规则,请将结束指针设置为当前指针。 Compare the difference of end-start pointer, if it is bigger than maxend-maxstart then change the maxend and maxstart and continue this operation. 比较end-start指针的差,如果它大于maxend-maxstart,则更改maxend和maxstart并继续此操作。 At the end you can print the part of array between maxstart and maxend. 最后,您可以打印maxstart和maxend之间的数组部分。

The approach with using 1 for odd, and -1 for even is indeed the right one. 使用1表示奇数,使用-1表示偶数的方法确实是正确的方法。 Let's call these values the increments (so including the decrements). 让我们将这些值称为增量(因此包括减量)。

Taking an example input, you could visualise these increments as follows: 以示例输入为例,您可以将这些增量可视化,如下所示:

  2   5   6   0   8   3   4   5   2   7   4   8   2   6   6   5   7 
 -1   1  -1  -1  -1   1  -1   1  -1   1  -1  -1  -1  -1  -1   1   1   

You could then visualise the cumulative sum of those increments below it: 然后,您可以可视化其下的那些增量的累积总和:

  2   5   6   0   8   3   4   5   2   7   4   8   2   6   6   5   7 
 -1   1  -1  -1  -1   1  -1   1  -1   1  -1  -1  -1  -1  -1   1   1   
0  -1   0  -1  -2  -3  -2  -3  -2  -3  -2  -3  -4  -5  -6  -7  -6  -5

 \     / \
   \ /     \
             \
               \ 
                 \     / \     / \     / \   
                   \ /     \ /     \ /     \ 
                                             \
                                               \
                                                 \
                                                   \
                                                     \             /
                                                       \         /
                                                         \     /
                                                           \ /

Note that the ranges that have an equal number of even and odd numbers, correspond to ranges for which the increments sum up to 0. You can identify such ranges by choosing begin/end points such that the cumulative sum before the start is equal to the cumulative sum after the end of that range. 请注意,具有相等数量的偶数和奇数的范围对应于增量加总为0的范围。您可以通过选择开始/结束点来确定这样的范围,以使开始之前的累积和等于该范围结束后的累积总和。 This is like drawing a horizontal line in the above "graph" and taking the farthest cross points. 这就像在上面的“图形”中绘制一条水平线并获取最远的交叉点。

So for example the first occurrences of two -1 values among the sums indicate that the sub array with [5, 6] is a valid range (there are an equal number of odd and even numbers in it). 因此,例如,在总和中首次出现两个-1值表示具有[5, 6]的子数组是有效范围(其中有相等数量的奇数和偶数)。 Looking for other such ranges, we can find that taking the left and rightmost -3 yields a larger result: [3 4 5 2 7 4] . 寻找其他这样的范围,我们可以发现采用最左边的-3会产生更大的结果: [3 4 5 2 7 4] We could also have taken -2 as boundary value: [8 3 4 5 2 7] . 我们也可以将-2作为边界值: [8 3 4 5 2 7]

We can also see that the longest range must correspond with a sum that lies between 0 end the end-sum (-5 in the example). 我们还可以看到,最长范围必须与介于0和结束总和之间的总和相对应(在示例中为-5)。 Take for example one that is not in this range: -6 in the example. 以不在此范围内的示例为例:在示例中为-6。 As both 0 and -5 are at the same side of -6, we are sure we can get a better result with -5 (moving the horizontal line up in the graph). 因为0和-5都在-6的同一侧,所以我们确信使用-5可以得到更好的结果(将图形中的水平线向上移动)。 This is true for all intermediate sum values that are ouside the range of 0 and the final sum. 对于所有介于0到最终和之间的中间和值,都是如此。

Another conclusion that can be drawn is that it is always possible to find an optimal solution where the left end-point cooresponds with a change in direction. 可以得出的另一个结论是,总是有可能找到最佳解决方案,其中左端点与方向变化相对应。 In the example, this is case with the first point with sum -3. 在此示例中,第一个点的总和为-3的情况。

The algorithm 算法

You could make a recursive algorithm that recurses whenever a point is found that complies with the above rules: 您可以创建一个递归算法,只要找到符合上述规则的点,该算法便会递归:

  • The sum before adding the value of the start point is within the range 0 and final sum 将起点值相加之前的总和在0和最终总和之间
  • The increment at the starting position is different from the one before it (or there is none before it), and goes against the overall direction from start to end. 起始位置的增量与之前的增量(或之前没有增量)不同,并且从开始到结束都与总体方向相反。 This corresponds to a valley in the above graph, but would be a hilltop when the final sum is positive. 这对应于上图中的一个谷,但是当最终总和为正时将是一个山顶。

The recursion stops when a sum is arrived at, which is equal to the final sum. 求和等于最终和时,递归停止。 This results immediately in a size. 这立即导致尺寸。 This size is returned to the caller (up one level in the recursion tree), and there the end of the array is shortened until the final sum is equal to the sum that is being looked at at that level of the recursion. 此大小返回给调用者(递归树中的上一层),并且数组的末尾被缩短,直到最终总和等于该递归级别上正在查看的总和。 This again results in a size. 这再次导致尺寸。 The best of these two sizes is returned to the caller, ...etc. 这两个大小中最好的一个将返回给调用者,等等。

No arrays are created in the algorithm, with the exception of the callstack generated by the recursion. 除递归生成的调用栈外,该算法中未创建任何数组。 But if the array is completely random, the number of recursive calls should be small on average, since the total sum has an expected value of 0 in statistical terms. 但是,如果数组是完全随机的,则递归调用的次数应平均较少,因为按统计术语,总和的预期值为0。

The time complexity is O(n) because the calculation of the overall sum is clearly O(n) , and the other two loops either move the start or the end index of the array in one direction, never visiting the same element again. 时间复杂度为O(n),因为总和的计算显然是O(n) ,而其他两个循环沿一个方向移动数组的开始索引或结束索引,从此不再访问同一元素。

JavaScript implementation JavaScript实现

This code uses the most simple JavaScript syntax so the algorithm is clear: 此代码使用最简单的JavaScript语法,因此算法很明确:

 function value(x) { // Return 1 when the given value is odd, else -1 return (x % 2) || -1; } function largestRange(a) { var sign, sumEnd, end; // Calculate final sum sumEnd = 0; for (end = 0; end < a.length; end++) { sumEnd = sumEnd + value(a[end]); } // ... and its sign (1 or -1 or 0) sign = Math.sign(sumEnd); function recurse(start, sumStart) { var sum, i, size, val; // End of recursion: if (sumStart === sumEnd) return end - start; sum = sumStart for (i = start; sum !== sumEnd; i++) { val = value(a[i]); // Got closer to sumEnd, and now moving away from it if (val !== sign && Math.sign(sum - sumStart) == sign) break; sum = sum + val; } // Get longest range size for this particular sum size = recurse(i, sum); // Get range size for sumStart while (sumEnd !== sumStart) { end--; sumEnd = sumEnd - value(a[end]); } // Retain the best of both: if (end - start > size) size = end - start; return size; } // Initiate the recursion and return result return recurse(0, 0); } // Sample input var a = [2, 5, 6, 0, 8, 3, 4, 5, 2, 7, 4, 8, 6, 6, 5, 7]; // Calculate var size = largestRange(a); // Output size of longest range console.log(size); 

Performance 性能

As the expected overall sum is expected to be close to 0 for random arrays, most intermediate sum values risk to be outside of the range of 0 and that final sum, meaning they don't contribute to much time and no space. 由于对于随机数组,预期总和预计接近于0,因此大多数中间总和值可能会超出0和该最终总和的范围,这意味着它们不会占用太多时间,也没有空间。

I did some performance tests comparing it with the solution that creates a hash with a key for each encountered sum. 我进行了一些性能测试,并将其与为每个遇到的总和创建带有密钥的哈希的解决方案进行了比较。 Those algorithms are faster for shorter input arrays, but for larger arrays (like 1000 entries), the above performed better. 这些算法对于较短的输入数组更快,但是对于较大的数组(例如1000个条目),以上算法表现更好。 Of course, one could adapt the hash-based solution to take into account the rules I identified above, and then it would also perform better for larger arrays. 当然,可以修改基于散列的解决方案,以考虑到我上面确定的规则,然后对于较大的数组也将表现更好。 Recursion brings a little overhead that hash maps don't have. 递归带来了哈希映射所没有的一些开销。

But as you showed interest in comments to see a solution without a hash map, I went for this solution. 但是,由于您对注释感兴趣,希望看到没有哈希图的解决方案,因此我选择了该解决方案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM