简体   繁体   中英

Finding unique numbers from sorted array in less than O(n)

I had an interview and there was the following question:

Find unique numbers from sorted array in less than O(n) time.

 Ex: 1 1 1 5 5 5 9 10 10 Output: 1 5 9 10 

I gave the solution but that was of O(n).

Edit: Sorted array size is approx 20 billion and unique numbers are approx 1000.

I don't think it can be done in less than O(n). Take the case where the array contains 1 2 3 4 5 : in order to get the correct output, each element of the array would have to be looked at, hence O(n).

Divide and conquer :

  • look at the first and last element of a sorted sequence (the initial sequence is data[0]..data[data.length-1] ).
  • If both are equal, the only element in the sequence is the first (no matter how long the sequence is).
  • If the are different, divide the sequence and repeat for each subsequence.

Solves in O(log(n)) in the average case, and O(n) only in the worst case (when each element is different).

Java code:

public static List<Integer> findUniqueNumbers(int[] data) {
    List<Integer> result = new LinkedList<Integer>();
    findUniqueNumbers(data, 0, data.length - 1, result, false);
    return result;
}

private static void findUniqueNumbers(int[] data, int i1, int i2, List<Integer> result, boolean skipFirst) {

    int a = data[i1];
    int b = data[i2];

    // homogenous sequence a...a
    if (a == b) {
        if (!skipFirst) {
            result.add(a);
        }
    }
    else {
        //divide & conquer
        int i3 = (i1 + i2) / 2;
        findUniqueNumbers(data, i1, i3, result, skipFirst);
        findUniqueNumbers(data, i3 + 1, i2, result, data[i3] == data[i3 + 1]);
    }
}

If your sorted array of size n has m distinct elements, you can do O(mlogn) .

Note that this is going to efficient when m << n (eg m=2 and n=100)

Algorithm:

Initialization: Current element y = first element x[0]

Step 1: Do a binary search for the last occurrence of y in x (can be done in O(log(n)) time. Let it's index be i

Step 2: y = x[i+1] and go to step 1

Edit: In cases where m = O(n) this algorithm is going to work badly. To alleviate it you can run it in parallel with regular O(n) algorithm. The meta algorithm consists of my algorithm and O(n) algorithm running in parallel. The meta algorithm stops when either of these two algorithms complete.

Since the data consists of integers, there are a finite number of unique values that can occur between any two values. So, start with looking at the first and last value in the array. If a[length-1] - a[0] < length - 1 , there will be some repeating values. Put a[0] and a[length-1] into some constant-access-time container like a hash set. If the two values are equal, you konow that there is only one unique value in the array and you are done. You know that the array is sorted. So, if the two values are different, you can look at the middle element now. If the middle element is already in the set of values, you know that you can skip the whole left part of the array and only analyze the right part recursively. Otherwise, analyze both left and right part recursively.

Depending on the data in the array you will be able to get the set of all unique values in a different number of operations. You get them in constant time O(1) if all the values are the same since you will know it after only checking the first and last element. If there are "relatively few" unique values, your complexity will be close to O(log N) because after each partition you will "quite often" be able to throw away at least one half of the analyzed sub-array. If the values are all unique and a[length-1] - a[0] = length - 1 , you can also "define" the set in constant time because they have to be consecutive numbers from a[0] to a[length-1] . However, in order to actually list them, you will have to output each number, and there are N of them.

Perhaps someone can provide a more formal analysis, but my estimate is that this algorithm is roughly linear in the number of unique values rather than the size of the array. This means that if there are few unique values, you can get them in few operations even for a huge array (eg in constant time regardless of array size if there is only one unique value). Since the number of unique values is no grater than the size of the array, I claim that this makes this algorithm "better than O(N)" (or, strictly: "not worse than O(N) and better in many cases").

import java.util.*;

/**
 * remove duplicate in a sorted array in average O(log(n)), worst O(n)
 * @author XXX
 */
public class UniqueValue {
    public static void main(String[] args) {
        int[] test = {-1, -1, -1, -1, 0, 0, 0, 0,2,3,4,5,5,6,7,8};
        UniqueValue u = new UniqueValue();
        System.out.println(u.getUniqueValues(test, 0, test.length - 1));
    }

    // i must be start index, j must be end index
    public List<Integer> getUniqueValues(int[] array, int i, int j) {
        if (array == null || array.length == 0) {
            return new ArrayList<Integer>();
        }
        List<Integer> result = new ArrayList<>();
        if (array[i] == array[j]) {
            result.add(array[i]);
        } else {
            int mid = (i + j) / 2;
            result.addAll(getUniqueValues(array, i, mid));

            // avoid duplicate divide
            while (mid < j && array[mid] == array[++mid]);
            if (array[(i + j) / 2] != array[mid]) {
                result.addAll(getUniqueValues(array, mid, j));
            }
        }
        return result;
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM