简体   繁体   中英

Finding a Particular Missing Array Element: O(n+k) Complexity

I have to create an algorithm which returns the largest missing element in an array of size n , which ranges from values 1- k .

Example: If the array contained 1,1,3,1,3 then k=3, n=5, and the return should be 2.

If time complexity isn't taken into account, this is relatively straightforward; simply iterate through the array searching for k-1, k-2,...2, and the first time a value isn't found, return that value (or -1 if no such value exists). However, this will be a loop inside a loop, resulting in O(n*k) complexity.

I'm specifically asked to achieve O(n+k) in my result, and while I'm new to algorithm design, I believe this means my algorithm is not allowed to contain any nested loops whatsoever. None of the array sorting methods I've learned have linear worst-case times, so I can't use those, and I can't see any other way to keep n and k separate in all segments.

I've been stuck on this concept for several hours now. I'm not necessarily looking for a full answer, but if anyone could point me in the right direction, or even explain how I should be approaching O(n+k) complexity in a problem like this, I'd be very grateful.

C# sample code with comments:

int k = 3;

int[] data = new int[] {1, 1, 3, 1, 3};

// Step 1: collecting items presented
// O(n): we have to scan "data" array == O(n), adding item - n * O(1) == O(n)
// O(n) + n * O(1) == O(n) + O(n) == O(n) for the step 1

HashSet<int> present = new HashSet<int>(data);

// Step 2: scan 1..k items and find items missing:
// O(k) - 1..k scan + k * O(1) - checking each item
// O(k) + k * O(1) == O(k) + O(k) = O(k) for the step 2  

var missing = Enumerable
  .Range(1, k)                             // O(k) - scan     
  .Where(item => !present.Contains(item))  // k * O(1) == O(k) 
  .ToList(); // materialize 

// Test
Console.Write(string.Join(" ", missing));

Finally, we have step 1 followed by step 2:

O(n) + O(k) = O(n + k)  

Edit: Step 2 neither Linq nor lambda , but good old for loop :

for (int i = 1; i <= k; ++i) // O(k) - scan
  if (!present.Contains(i))  // O(1) - check
    Console.WriteLine(i);

You could use another array to store the occurrences without using hashset, though it would waste some memory,

public int maxMissingElement(final int arr[], final int max) {
boolean[] holder = new boolean[max + 1];

for (int i = 0; i < arr.length; i++) {
  holder[arr[i]]=true;
}

for (int i = max - 1; i >= 0; i--) {
  if (!holder[i]) {
    return i;
  }
}
return -1;
}

Time complexity: O(n+k)

Space complexity: O(k)

First thing that comes to my mind is keeping a hash set of size k . A hash set has the property that it is created in linear-k time, and looking up data is in O(1) time.

This means that you could create a hash set (or dictionary) with a flag for each value of k. So for k = 3 you'd get a set (0 -> false, 1 -> false, 2 -> false, 3 -> false) .

Then you could simply walk through the array of data (of size n) in linear-n time, and for each value simply set the corresponding record in your hash set to true.

After this process, you've ended up with a hash set that has records set to true only for the values that exist in the data set.

At the end, simply walk through the hash set once and keep an object called result for instance. Each time you find a value that is false (ie did not exist in the data array) you compare it to the result . If the value is bigger than result , you set result to that value.

At the end of the cycle over the hash set, the value in result will be the biggest missing value in the data set.

So is it O(k + n) ?

  • We created the hash set = O(k)
  • We walked over the data set, each time checking the hash set = O(n) * O(1)
  • We walked over the hash set, comparing and updating a variable = O(k) * O(1)

We add up these time complexities as follows:

 O(k) + O(n)O(1) + O(k)*O(1)
 = 2 * O(k) + O(n) = O(k) + O(n)
 = O(k + n).

Note

Keeping a hash set to me sounds practical but not really elegant. I'm sure that there exists a way to do this without the hash set, but it meets the constraints of the problem.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM