简体   繁体   中英

Algorithm: array with odd and even numbers

Given an array with n elements, I want to calculate the largest range of the array in which there are just as many odd as even numbers.

Example

Input:

2 4 1 6 3 0 8 10 4 1 1 4 5 3 6

Expected output:

12

What I tried

I tried using the following steps:

  • change all odd numbers to 1 and all even numbers to -1
  • Inspect all possible sub-arrays, and for each calculate the sum of the 1 and -1 values.
  • Take the largest of these sub-arrays that has a sum of 0

But this has a time complexity of O(n^2) .

Question

How can I do this in O(n) ?

Given : array a

Task : find the largest sub-array with an even amount of odd and even nubers

Solution O(n)

  1. create a hash map m with the highest entry for each cumulative sum when replacing odd numbers with -1 and even numbers with 1, in java:

     Map<Integer, Integer> m = new HashMap<>(); int sum = 0; for (int i = 0; i < a.length; i++) { sum += a[i] % 2 == 0 ? 1 : -1; m.put(sum, i); } 
  2. find the largest sub-array which sums up to 0 by looking for the biggest distance using m, in java:

     int bestStart = -1, bestEnd = -1; // indexes, so end inclusive sum = 0; for (int i = 0; i < a.length; i++) { Integer end = m.get(sum); sum += a[i] % 2 == 0 ? 1 : -1; if (end != null && end - i > bestEnd - bestStart) { bestStart = i; bestEnd = end; } } 

It's based on the observation that you can get the sum (after transforming the elements to 1 and -1) from x to y with cumSum[y] - cumSum[x - 1]. So if we want this to be 0 then they need to be the same (careful if x = 0 then cumSum = 0, cumSum[-1] is not defined).

This is a common dynamic programming problem. We can maintain suboptimal solutions by iterating through the list and at the same time updating optimal solution. It is similar to finding a maximum element of array. Set the first element as maximum and update it in each iteration if it is needed. This problem just needs a bit more.

We need 5 pointers (integers, and actually 5). Start pointer, end pointer and current pointer, maxend, maxstart. Set the start and current pointer to the start of the array. While the next coming elements obey the rules (alternating odd and even), increase the current pointer. Once they don't obey the rule, set the end pointer to the current pointer. Compare the difference of end-start pointer, if it is bigger than maxend-maxstart then change the maxend and maxstart and continue this operation. At the end you can print the part of array between maxstart and maxend.

The approach with using 1 for odd, and -1 for even is indeed the right one. Let's call these values the increments (so including the decrements).

Taking an example input, you could visualise these increments as follows:

  2   5   6   0   8   3   4   5   2   7   4   8   2   6   6   5   7 
 -1   1  -1  -1  -1   1  -1   1  -1   1  -1  -1  -1  -1  -1   1   1   

You could then visualise the cumulative sum of those increments below it:

  2   5   6   0   8   3   4   5   2   7   4   8   2   6   6   5   7 
 -1   1  -1  -1  -1   1  -1   1  -1   1  -1  -1  -1  -1  -1   1   1   
0  -1   0  -1  -2  -3  -2  -3  -2  -3  -2  -3  -4  -5  -6  -7  -6  -5

 \     / \
   \ /     \
             \
               \ 
                 \     / \     / \     / \   
                   \ /     \ /     \ /     \ 
                                             \
                                               \
                                                 \
                                                   \
                                                     \             /
                                                       \         /
                                                         \     /
                                                           \ /

Note that the ranges that have an equal number of even and odd numbers, correspond to ranges for which the increments sum up to 0. You can identify such ranges by choosing begin/end points such that the cumulative sum before the start is equal to the cumulative sum after the end of that range. This is like drawing a horizontal line in the above "graph" and taking the farthest cross points.

So for example the first occurrences of two -1 values among the sums indicate that the sub array with [5, 6] is a valid range (there are an equal number of odd and even numbers in it). Looking for other such ranges, we can find that taking the left and rightmost -3 yields a larger result: [3 4 5 2 7 4] . We could also have taken -2 as boundary value: [8 3 4 5 2 7] .

We can also see that the longest range must correspond with a sum that lies between 0 end the end-sum (-5 in the example). Take for example one that is not in this range: -6 in the example. As both 0 and -5 are at the same side of -6, we are sure we can get a better result with -5 (moving the horizontal line up in the graph). This is true for all intermediate sum values that are ouside the range of 0 and the final sum.

Another conclusion that can be drawn is that it is always possible to find an optimal solution where the left end-point cooresponds with a change in direction. In the example, this is case with the first point with sum -3.

The algorithm

You could make a recursive algorithm that recurses whenever a point is found that complies with the above rules:

  • The sum before adding the value of the start point is within the range 0 and final sum
  • The increment at the starting position is different from the one before it (or there is none before it), and goes against the overall direction from start to end. This corresponds to a valley in the above graph, but would be a hilltop when the final sum is positive.

The recursion stops when a sum is arrived at, which is equal to the final sum. This results immediately in a size. This size is returned to the caller (up one level in the recursion tree), and there the end of the array is shortened until the final sum is equal to the sum that is being looked at at that level of the recursion. This again results in a size. The best of these two sizes is returned to the caller, ...etc.

No arrays are created in the algorithm, with the exception of the callstack generated by the recursion. But if the array is completely random, the number of recursive calls should be small on average, since the total sum has an expected value of 0 in statistical terms.

The time complexity is O(n) because the calculation of the overall sum is clearly O(n) , and the other two loops either move the start or the end index of the array in one direction, never visiting the same element again.

JavaScript implementation

This code uses the most simple JavaScript syntax so the algorithm is clear:

 function value(x) { // Return 1 when the given value is odd, else -1 return (x % 2) || -1; } function largestRange(a) { var sign, sumEnd, end; // Calculate final sum sumEnd = 0; for (end = 0; end < a.length; end++) { sumEnd = sumEnd + value(a[end]); } // ... and its sign (1 or -1 or 0) sign = Math.sign(sumEnd); function recurse(start, sumStart) { var sum, i, size, val; // End of recursion: if (sumStart === sumEnd) return end - start; sum = sumStart for (i = start; sum !== sumEnd; i++) { val = value(a[i]); // Got closer to sumEnd, and now moving away from it if (val !== sign && Math.sign(sum - sumStart) == sign) break; sum = sum + val; } // Get longest range size for this particular sum size = recurse(i, sum); // Get range size for sumStart while (sumEnd !== sumStart) { end--; sumEnd = sumEnd - value(a[end]); } // Retain the best of both: if (end - start > size) size = end - start; return size; } // Initiate the recursion and return result return recurse(0, 0); } // Sample input var a = [2, 5, 6, 0, 8, 3, 4, 5, 2, 7, 4, 8, 6, 6, 5, 7]; // Calculate var size = largestRange(a); // Output size of longest range console.log(size); 

Performance

As the expected overall sum is expected to be close to 0 for random arrays, most intermediate sum values risk to be outside of the range of 0 and that final sum, meaning they don't contribute to much time and no space.

I did some performance tests comparing it with the solution that creates a hash with a key for each encountered sum. Those algorithms are faster for shorter input arrays, but for larger arrays (like 1000 entries), the above performed better. Of course, one could adapt the hash-based solution to take into account the rules I identified above, and then it would also perform better for larger arrays. Recursion brings a little overhead that hash maps don't have.

But as you showed interest in comments to see a solution without a hash map, I went for this solution.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM