简体   繁体   English

在不使用哈希表的情况下从Array中删除重复项

[英]Remove duplicates from Array without using Hash Table

i have an array which might contain duplicate elements(more than two duplicates of an element). 我有一个数组,可能包含重复的元素(一个元素的两个以上重复)。 I wonder if it's possible to find and remove the duplicates in the array: 我想知道是否有可能找到并删除数组中的重复项:

  • without using Hash Table (strict requirement) 不使用哈希表(严格要求)
  • without using a temporary secondary array. 不使用临时二级数组。 No restrictions on complexity. 对复杂性没有限制。

PS : This is not Home work question PS这不是家庭工作问题

Was asked to my friend in yahoo technical interview 在雅虎的技术采访中被问到我的朋友

Sort the source array. 对源数组进行排序。 Find consecutive elements that are equal. 找到相等的连续元素。 (Ie what std::unique does in C++ land). (即std::unique在C ++中的作用)。 Total complexity is N lg N, or merely N if the input is already sorted. 总复杂度为N lg N,如果输入已经排序,则仅为N.

To remove duplicates, you can copy elements from later in the array over elements earlier in the array also in linear time. 要删除重复项,您可以在线性时间内从数组中较早的元素复制数组中较晚的元素。 Simply keep a pointer to the new logical end of the container, and copy the next distinct element to that new logical end at each step. 只需保留指向容器新逻辑端的指针,并在每一步将下一个不同元素复制到该新逻辑端。 (Again, exactly like std::unique does (In fact, why not just download an implementation of std::unique and do exactly what it does? :P)) (再次,就像std::unique一样(事实上,为什么不下载std::unique的实现并完全按照它做的那样做?:P))

O(NlogN) : Sort and replace consecutive same element with one copy. O(NlogN):用一个副本对连续的相同元素进行排序和替换。

O(N 2 ) : Run nested loop to compare each element with the remaining elements in the array, if duplicate found, swap the duplicate with the element at the end of the array and decrease the array size by 1. O(N 2 ):运行嵌套循环以将每个元素与数组中的其余元素进行比较,如果找到重复,则将副本与数组末尾的元素交换,并将数组大小减小1。

No restrictions on complexity. 对复杂性没有限制。

So this is a piece of cake. 所以这是小菜一碟。

// A[1], A[2], A[3], ... A[i], ... A[n]

// O(n^2)
for(i=2; i<=n; i++)
{
    duplicate = false;
    for(j=1; j<i; j++)
        if(A[i] == A[j])
             {duplicate = true; break;}
    if(duplicate)
    {
        // "remove" A[i] by moving all elements from its left over it
        for(j=i; j<n; j++)
            A[j] = A[j+1];
        n--;
    }
}

In-place duplicate removal that preserves the existing order of the list, in quadratic time: 在二次时间内保留列表的现有顺序的就地重复删除:

for (var i = 0; i < list.length; i++) {
  for (var j = i + 1; j < list.length;) {
    if (list[i] == list[j]) {
      list.splice(j, 1);
    } else {
      j++;
    }
  }
}

The trick is to start the inner loop on i + 1 and not increment the inner counter when you remove an element. 诀窍是在i + 1上启动内部循环,而在删除元素时不增加内部计数器。

The code is JavaScript, splice(x, 1) removes the element at x . 代码是JavaScript, splice(x, 1)删除x处的元素。

If order preservation isn't an issue, then you can do it quicker: 如果订单保存不是问题,那么您可以更快地完成:

list.sort();

for (var i = 1; i < list.length;) {
  if (list[i] == list[i - 1]) {
    list.splice(i, 1);
  } else {
    i++;
  }
}

Which is linear, unless you count the sort, which you should, so it's of the order of the sort -- in most cases n × log(n). 这是线性的,除非你计算排序,你应该,所以它是排序的顺序 - 在大多数情况下n×log(n)。

In functional languages you can combine sorting and unicification (is that a real word?) in one pass. 在函数式语言中,您可以在一次传递中将排序和统一(这是一个真正的单词吗?)结合起来。 Let's take the standard quick sort algorithm: 我们采用标准的快速排序算法:

- Take the first element of the input (x) and the remaining elements (xs)
- Make two new lists
- left: all elements in xs smaller than or equal to x
- right: all elements in xs larger than x
- apply quick sort on the left and right lists
- return the concatenation of the left list, x, and the right list
- P.S. quick sort on an empty list is an empty list (don't forget base case!)

If you want only unique entries, replace 如果您只想要唯一条目,请替换

left: all elements in xs smaller than or equal to x

with

left: all elements in xs smaller than x

This is a one-pass O(n log n) algorithm. 这是一次通过O(n log n)算法。

Example implementation in F#: F#中的示例实现:

let rec qsort = function
    | [] -> []
    | x::xs -> let left,right = List.partition (fun el -> el <= x) xs
               qsort left @ [x] @ qsort right

let rec qsortu = function
    | [] -> []
    | x::xs -> let left = List.filter (fun el -> el < x) xs
               let right = List.filter (fun el -> el > x) xs
               qsortu left @ [x] @ qsortu right

And a test in interactive mode: 并以交互模式进行测试:

> qsortu [42;42;42;42;42];;
val it : int list = [42]
> qsortu [5;4;4;3;3;3;2;2;2;2;1];;
val it : int list = [1; 2; 3; 4; 5]
> qsortu [3;1;4;1;5;9;2;6;5;3;5;8;9];;
val it : int list = [1; 2; 3; 4; 5; 6; 8; 9]

doesn't use a hash table per se but i know behind the scenes it's an implementation of one. 本身不使用哈希表,但我知道幕后它是一个实现。 Nevertheless, thought I might post in case it can help. 不过,我想可以发帖,以防它可以提供帮助。 This is in JavaScript and uses an associative array to record duplicates to pass over 这是在JavaScript中,并使用关联数组来记录要传递的重复项

function removeDuplicates(arr) {
    var results = [], dups = []; 

    for (var i = 0; i < arr.length; i++) {

        // check if not a duplicate
        if (dups[arr[i]] === undefined) {

            // save for next check to indicate duplicate
            dups[arr[i]] = 1; 

            // is unique. append to output array
            results.push(arr[i]);
        }
    }

    return results;
}

Since it's an interview question it is usually expected by the interviewer to be asked precisions about the problem. 由于这是面试问题,因此面试官通常会要求对问题进行精确处理。

With no alternative storage allowed (that is O(1) storage allowed in that you'll probably use some counters / pointers), it seems obvious that a destructive operation is expected, it might be worth pointing it out to the interviewer. 由于不允许使用替代存储(即允许使用O(1)存储,因此您可能会使用某些计数器/指针),显然预计会发生破坏性操作,因此可能值得向面试官指出。

Now the real question is: do you want to preserve the relative order of the elements ? 现在真正的问题是:你想保留元素的相对顺序吗? ie is this operation supposed to be stable ? 即这个操作应该稳定吗?

Stability hugely impact the available algorithms (and thus the complexity). 稳定性极大地影响了可用的算法(从而影响了复杂性)。

The most obvious choice is to list Sorting Algorithms , after all, once the data is sorted, it's pretty easy to get unique elements. 最明显的选择是列出排序算法 ,毕竟,一旦数据被排序,就很容易获得独特的元素。

But if you want stability, you cannot actually sort the data (since you could not get the "right" order back) and thus I wonder if it solvable in less than O(N**2) if stability is involved. 但是如果你想要稳定性,你实际上不能对数据进行排序(因为你无法得到“正确”的顺序),因此我想知道如果涉及稳定性,它是否可以在小于O(N ** 2)的情况下解决。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM