简体   繁体   English

在线性时间内从数组中删除重复项,无需额外 arrays

[英]Remove duplicates from array in linear time and without extra arrays

We have an array and it is unsorted.我们有一个数组,它是未排序的。 We know the range is [0,n].我们知道范围是 [0,n]。

We want to remove duplicates but we cannot use extra arrays and it must run in linear time .我们想删除重复但我们不能使用额外的 arrays并且它必须以线性时间运行

Any ideas?有任何想法吗? Just to clarify, this is not for homework!澄清一下,这不是作业!

If the integers are limited 0 to n, you can move through the array, placing numbers by their indices.如果整数限制为 0 到 n,您可以在数组中移动,按索引放置数字。 Every time you replace a number, take the value that used to be there and move it to where it should be.每次替换一个数字时,取过去存在的值并将其移动到应有的位置。 For instance, let's say we have an array of size 8:例如,假设我们有一个大小为 8 的数组:

-----------------
|3|6|3|4|5|1|7|7|
-----------------
 S

Where S is our starting point, and we'll use C to keep track of our "current" index below.其中 S 是我们的起点,我们将使用 C 来跟踪下面的“当前”索引。 We start with index 0, and move 3 to the 3 index spot, where 4 is.我们从索引 0 开始,然后将 3 移动到 3 索引位置,即 4 所在的位置。 Save 4 in a temp var.在临时变量中保存 4。

-----------------
|X|6|3|3|5|1|7|7|   Saved 4 
-----------------  
 S     C

We then put 4 in the index 4, saving what used to be there, 5.然后我们将 4 放入索引 4 中,保存以前存在的 5。

-----------------
|X|6|3|3|4|1|7|7|   Saved 5
-----------------
 S       C

Keep going继续

-----------------
|X|6|3|3|4|5|7|7|   Saved 1
-----------------
 S         C

-----------------
|X|1|3|3|4|5|7|7|   Saved 6
-----------------
 S C

-----------------
|X|1|3|3|4|5|6|7|   Saved 7    
-----------------
 S           C 

When we try to replace 7, we see a conflict, so we simply don't place it.当我们尝试替换 7 时,我们看到了冲突,所以我们根本不放置它。 We then continue from the starting index S, increment it by 1:然后我们从起始索引 S 继续,将其增加 1:

-----------------
|X|1|3|3|4|5|6|7| 
-----------------  
   S           

1 is fine here, 3 needs to move 1在这里很好,3需要移动

-----------------
|X|1|X|3|4|5|6|7|
-----------------
     S

But 3 is a duplicate, so we throw it away and keep iterating through the rest of the array.但是 3 是重复的,因此我们将其丢弃并继续遍历数组的其余部分。

So basically, we move each entry at most 1 time, and iterate through the entire array.所以基本上,我们最多移动每个条目 1 次,并遍历整个数组。 That's O(2n) = O(n)那是 O(2n) = O(n)

Assume int a[n] is an array of integers in the range [0,n-1].假设int a[n]是 [0,n-1] 范围内的整数数组。 Note that this differs slightly from the stated problem, but I make this assumption to make clear how the algorithm works.请注意,这与所述问题略有不同,但我做出此假设是为了明确算法的工作原理。 The algorithm can be patched up to work for integers in the range [0,n].该算法可以修补以适用于 [0,n] 范围内的整数。

for (int i=0; i<n; i++)
{
    if (a[i] != i)
    {
         j = a[i];
         k = a[j];
         a[j] = j;  // Swap a[j] and a[i]
         a[i] = k;
     }
 }

 for (int i=0; i<n; i++)
 {
     if (a[i] == i)
     {
        printf("%d\n", i);
     }
 }
    void printRepeating(int arr[], int size)
{
  int i;
  printf("The repeating elements are: \n");
  for(i = 0; i < size; i++)
  {
    if(arr[abs(arr[i])] >= 0)
      arr[abs(arr[i])] = -arr[abs(arr[i])];
    else
      printf(" %d ", abs(arr[i]));
  }
}

Walk through the array assign array[array[i]] = -array[array[i]];遍历数组assign array[array[i]] = -array[array[i]]; if not negative;如果不是负数; if its already negative then its duplicate, this will work since all values are within 0 and n.如果它已经是负数,那么它是重复的,这将起作用,因为所有值都在 0 和 n 之内。

Extending @Joel Lee's code for completion.扩展@Joel Lee 的代码以完成。

#include <iostream>
void remove_duplicates(int *a, int size)
{
  int i, j, k;
  bool swap = true;

   while(swap){
    swap = false;
    for (i=0; i<size; i++){
        if(a[i] != i && a[i] != a[a[i]]){
            j = a[i];
            k = a[j];
            a[i] = k;
            a[j] = j;
            swap = true;      
        }

    }
    }
}

int main()
{
    int i;
    //int array[8] = {3,6,3,4,5,1,7,7};
    int array[8] = {7,4,6,3,5,4,6,2};

    remove_duplicates(array, sizeof(array)/sizeof(int));

    for (int i=0; i<8; i++)
        if(array[i] == i)
            std::cout << array[i] << " ";

    return 0;
}

Can you sort?你能排序吗? Sort with Radix Sort - http://en.wikipedia.org/wiki/Radix_sort with complexity O(arraySize) for given case and then remove duplicates from sorted array O(arraySize).使用基数排序进行排序 - http://en.wikipedia.org/wiki/Radix_sort对于给定的情况,复杂度为 O(arraySize),然后从排序数组 O(arraySize) 中删除重复项。

With ES6 I think this can be solved with only a few lines reducing the array into an object and then using object.keys to get array without duplicates.使用 ES6,我认为这可以解决,只需几行将数组减少为一个对象,然后使用 object.keys 获取没有重复的数组。 This probably takes more memory.这可能需要更多内存。 I'm not sure.我不知道。

I did it like this:我是这样做的:

var obj = array.reduce(function (acc, elem) {
      acc[elem] = true;
      return acc;
    },{});
var uniqueArray = Object.keys(obj);

This has the added bonus (or disadvantage) of sorting the array.这具有对数组进行排序的额外好处(或缺点)。 It works with strings too.它也适用于字符串。

Use the array aa container with negative sign as an indicator, this will corrupt the input though.使用带有负号的数组 aa 容器作为指示符,但这会破坏输入。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM