从排序数组中查找小于O（n）的唯一数字

Question

I had an interview and there was the following question: 我接受了采访，有以下问题：

Find unique numbers from sorted array in less than O(n) time. 在小于O（n）的时间内从排序的数组中查找唯一的数字。
 Ex: 1 1 1 5 5 5 9 10 10 Output: 1 5 9 10 

I gave the solution but that was of O(n). 我给出了解决方案，但那是O（n）。

Edit: Sorted array size is approx 20 billion and unique numbers are approx 1000. 编辑：排序的数组大小约为200亿，唯一数字约为1000。

Answer 1

I don't think it can be done in less than O(n). 我不认为它可以在少于O（n）的情况下完成。 Take the case where the array contains 1 2 3 4 5 : in order to get the correct output, each element of the array would have to be looked at, hence O(n). 假设数组包含1 2 3 4 5 ：为了获得正确的输出，必须查看数组的每个元素，因此O（n）。

Answer 2

Divide and conquer : 分而治之 ：

look at the first and last element of a sorted sequence (the initial sequence is data[0]..data[data.length-1] ). 查看排序序列的第一个和最后一个元素（初始序列是data[0]..data[data.length-1] ）。
If both are equal, the only element in the sequence is the first (no matter how long the sequence is). 如果两者相等，则序列中唯一的元素是第一个（无论序列有多长）。
If the are different, divide the sequence and repeat for each subsequence. 如果不同，则将序列分开并重复每个子序列。

Solves in O(log(n)) in the average case, and O(n) only in the worst case (when each element is different). 在平均情况下以O（log（n））求解，而在最坏的情况下（当每个元素不同时）求解O（n）。

Java code: Java代码：

public static List<Integer> findUniqueNumbers(int[] data) {
    List<Integer> result = new LinkedList<Integer>();
    findUniqueNumbers(data, 0, data.length - 1, result, false);
    return result;
}

private static void findUniqueNumbers(int[] data, int i1, int i2, List<Integer> result, boolean skipFirst) {

    int a = data[i1];
    int b = data[i2];

    // homogenous sequence a...a
    if (a == b) {
        if (!skipFirst) {
            result.add(a);
        }
    }
    else {
        //divide & conquer
        int i3 = (i1 + i2) / 2;
        findUniqueNumbers(data, i1, i3, result, skipFirst);
        findUniqueNumbers(data, i3 + 1, i2, result, data[i3] == data[i3 + 1]);
    }
}

Answer 3

If your sorted array of size n has m distinct elements, you can do O(mlogn) . 如果您排序的大小为n数组有m不同的元素，则可以执行O(mlogn) 。

Note that this is going to efficient when m << n (eg m=2 and n=100) 注意，当m << n (eg m=2 and n=100)时，这将是有效的

Algorithm: 算法：

Initialization: Current element y = first element x[0] 初始化：当前元素y = first element x[0]

Step 1: Do a binary search for the last occurrence of y in x (can be done in O(log(n)) time. Let it's index be i 步骤1：二进制搜索x最后一次出现的y （可以在O(log(n))时间内完成。让它的索引为i

Step 2: y = x[i+1] and go to step 1 步骤2： y = x[i+1]并转到步骤1

Edit: In cases where m = O(n) this algorithm is going to work badly. 编辑：在m = O(n)情况下，该算法将会运行得很糟糕。 To alleviate it you can run it in parallel with regular O(n) algorithm. 为了减轻它，你可以与常规的O(n)算法并行运行它。 The meta algorithm consists of my algorithm and O(n) algorithm running in parallel. 元算法由我的算法和并行运行的O(n)算法组成。 The meta algorithm stops when either of these two algorithms complete. 当两个算法中的任何一个完成时，元算法停止。

Answer 4

Since the data consists of integers, there are a finite number of unique values that can occur between any two values. 由于数据由整数组成，因此在任意两个值之间可能存在有限数量的唯一值。 So, start with looking at the first and last value in the array. 因此，首先查看数组中的第一个和最后一个值。 If a[length-1] - a[0] < length - 1 , there will be some repeating values. 如果a[length-1] - a[0] < length - 1 ，则会有一些重复值。 Put a[0] and a[length-1] into some constant-access-time container like a hash set. 将a[0]和a[length-1]放入一个像散列集一样的常量访问时容器中。 If the two values are equal, you konow that there is only one unique value in the array and you are done. 如果这两个值相等，那么您可以知道数组中只有一个唯一值，您就完成了。 You know that the array is sorted. 您知道数组已排序。 So, if the two values are different, you can look at the middle element now. 因此，如果两个值不同，您现在可以查看中间元素。 If the middle element is already in the set of values, you know that you can skip the whole left part of the array and only analyze the right part recursively. 如果中间元素已经在值集中，则您知道可以跳过数组的整个左侧部分，并且只能递归地分析右侧部分。 Otherwise, analyze both left and right part recursively. 否则，递归地分析左右两部分。

Depending on the data in the array you will be able to get the set of all unique values in a different number of operations. 根据数组中的数据，您将能够在不同数量的操作中获取所有唯一值的集合。 You get them in constant time O(1) if all the values are the same since you will know it after only checking the first and last element. 如果所有值都相同，则在恒定时间O(1)得到它们，因为在仅检查第一个和最后一个元素之后您将知道它。 If there are "relatively few" unique values, your complexity will be close to O(log N) because after each partition you will "quite often" be able to throw away at least one half of the analyzed sub-array. 如果“相对较少”的唯一值，您的复杂性将接近于O(log N)因为在每个分区之后，您将“经常”能够丢弃至少一半分析的子阵列。 If the values are all unique and a[length-1] - a[0] = length - 1 , you can also "define" the set in constant time because they have to be consecutive numbers from a[0] to a[length-1] . 如果值都是唯一的并且a[length-1] - a[0] = length - 1 ，您还可以在恒定时间内“定义”该集合，因为它们必须是从a[0]到a[length-1] a[0]连续数字a[length-1] 。 However, in order to actually list them, you will have to output each number, and there are N of them. 但是，为了实际列出它们，您必须输出每个数字，并且它们中有N个。

Perhaps someone can provide a more formal analysis, but my estimate is that this algorithm is roughly linear in the number of unique values rather than the size of the array. 也许有人可以提供更正式的分析，但我估计这个算法在唯一值的数量上大致是线性的而不是数组的大小。 This means that if there are few unique values, you can get them in few operations even for a huge array (eg in constant time regardless of array size if there is only one unique value). 这意味着如果只有很少的唯一值，即使对于一个庞大的数组，也可以在很少的操作中得到它们（例如，如果只有一个唯一值，则无论数组大小如何都在恒定时间内）。 Since the number of unique values is no grater than the size of the array, I claim that this makes this algorithm "better than O(N)" (or, strictly: "not worse than O(N) and better in many cases"). 由于唯一值的数量不比数组的大小大，我声称这使得该算法“优于O（N）”（或严格地说：“不比O（N）差，在许多情况下更好” ）。

Answer 5

import java.util.*;

/**
 * remove duplicate in a sorted array in average O(log(n)), worst O(n)
 * @author XXX
 */
public class UniqueValue {
    public static void main(String[] args) {
        int[] test = {-1, -1, -1, -1, 0, 0, 0, 0,2,3,4,5,5,6,7,8};
        UniqueValue u = new UniqueValue();
        System.out.println(u.getUniqueValues(test, 0, test.length - 1));
    }

    // i must be start index, j must be end index
    public List<Integer> getUniqueValues(int[] array, int i, int j) {
        if (array == null || array.length == 0) {
            return new ArrayList<Integer>();
        }
        List<Integer> result = new ArrayList<>();
        if (array[i] == array[j]) {
            result.add(array[i]);
        } else {
            int mid = (i + j) / 2;
            result.addAll(getUniqueValues(array, i, mid));

            // avoid duplicate divide
            while (mid < j && array[mid] == array[++mid]);
            if (array[(i + j) / 2] != array[mid]) {
                result.addAll(getUniqueValues(array, mid, j));
            }
        }
        return result;
    }
}

从排序数组中查找小于O（n）的唯一数字

问题描述

5 个解决方案

解决方案1
14 2014-11-16 14:50:53

解决方案2
14 已采纳 2014-11-16 16:39:08

解决方案3
4 2014-11-16 19:55:25

解决方案4
0 2014-11-16 16:24:09

解决方案5
0 2016-08-24 05:23:31

从排序数组中查找小于O（n）的唯一数字

问题描述

5 个解决方案

解决方案1 14 2014-11-16 14:50:53

解决方案2 14 已采纳 2014-11-16 16:39:08

解决方案3 4 2014-11-16 19:55:25

解决方案4 0 2014-11-16 16:24:09

解决方案5 0 2016-08-24 05:23:31

解决方案1
14 2014-11-16 14:50:53

解决方案2
14 已采纳 2014-11-16 16:39:08

解决方案3
4 2014-11-16 19:55:25

解决方案4
0 2014-11-16 16:24:09

解决方案5
0 2016-08-24 05:23:31