简体   繁体   English

这段代码如何从排序数组中删除重复项?

[英]How does this code remove duplicates from a sorted array?

If I pass in a number, an array 2, 3, 3, 3, 1 , for example. 如果我通过在若干,阵列2, 3, 3, 3, 1 ,例如。 It removes the duplicates of 3, but why? 它删除了重复的3,但为什么呢? and the result is 123 (which it should, because of the sorting method). 结果是123(由于排序方法,它应该是这样的)。

Sortering.sorterHeltallstabell(tab) is only sorting my code, while the rest removes the duplicates. Sortering.sorterHeltallstabell(tab)只对我的代码进行排序,而其余的则删除重复的代码。 It gets sorted before duplicates are removed. 它会在删除重复项之前进行排序。

Why does this code remove duplicates of an array when you pass it into the method? 为什么在将代码传递给方法时,此代码会删除数组的重复项?

public static int[] utenDubletter(int[] tab){
    Sortering.sorterHeltallstabell(tab);

    if (tab.length < 2)
        return tab;

    //why does this code remove duplicates?
    int j = 0;
    int i = 1;

    while (i < tab.length) {
        if (tab[i] == tab[j]) {
            i++;
        } else {
            j++;
            tab[j] = tab[i];
            i++;
        }           
    }              
    int[] B = Arrays.copyOf(tab, j + 1);
    return B;
}

Dig this simulation: 挖掘这个模拟:

Start
[1, 1, 1, 2, 2, 2, 3]
 j  i
Duplicate found (tab[i] == tab[j]), move i over 1 (i++)
[1, 1, 1, 2, 2, 2, 3]
 j     i
Duplicate found, move i over 1
[1, 1, 1, 2, 2, 2, 3]
 j        i
Non-duplicate (else); adding 1 to j (j++), 
    copying element i to el j (tab[j] = tab[i]), 
    adding 1 to i (i++)
[1, 2, 1, 2, 2, 2, 3]
    j        i
Duplicate found, move i over 1
[1, 2, 1, 2, 2, 2, 3]
    j           i
Duplicate found, move i over 1
[1, 2, 1, 2, 2, 2, 3]
    j              i
Non-duplicate; adding 1 to j, copying i to j, adding 1 to i
[1, 2, 3, 2, 2, 2, 3]
       j              i
i == tab.length, so stop
Copy first 3 elements of array (up to j) to result 
    (int[] B = Arrays.copyOf(tab, j + 1)) and return it

This works because the code under 这是因为下面的代码

if (tab[i] == tab[j])

iterates through the sorted array, skipping duplicated elements and instead copying each unique element forward to the front part of the array, just after the already scanned (and known to be unique elements). 迭代排序的数组,跳过重复的元素,然后将每个唯一元素向前复制到数组的前部,就在已经扫描过的(并且已知是唯一元素)之后。 It then only keeps that front part of the array. 然后它只保留数组的前部。

Stepping through the code: 单步执行代码:

if (tab.length < 2)
    return tab;

int j = 0;
int i = 1;

Method gets input: [1,1,1,2,2,2,3], is sorted (since the input is already sorted in this example, there is no change), tab is greater than 2, so do not return. 方法得到输入:[1,1,1,2,2,2,3],是排序的(因为输入已经在这个例子中排序,没有变化),tab大于2,所以不要返回。 j is assigned the value 0 i is assigned the value 1 j被赋值为0 i被赋值为1

while (i < tab.length) { ... }

i (which is 1) is less than the tab length (which is 7). i(为1)小于标签长度(即7)。 While loop is entered: 输入循环时:

    if (tab[i] == tab[j]) {
        i++;
    } else {
        j++;
        tab[j] = tab[i];
        i++;
    }       

Iteration 1: tab[i], which is tab[1], which is 1, is compared to tab[j], which is tab[0], which is 1. They are equal, so i is incremented. 迭代1:tab [i],它是tab [1],它是1,与tab [j]进行比较,tab [j]是tab [0],它是1.它们相等,所以i递增。 i is now 2. 我现在2岁。

Iteration 2: tab[i] (tab[2], or 1) is compared to tab[j] (tab[0] or 1). 迭代2:tab [i](tab [2]或1)与tab [j](tab [0]或1)进行比较。 They are equal, so i is incremented. 他们是平等的,所以我增加了。 i is now 3. 我现在3岁。

Iteration 3: tab[i] (tab[3], or 2) is compared to tab[j] (tab[0] or 1). 迭代3:tab [i](tab [3]或2)与tab [j](tab [0]或1)进行比较。 They are not equal. 他们不平等。 j is incremented, and is now 1. tab[j] (tab[1]) is assigned the value of tab[i] (tab[3]). j递增,现在为1. tab [j](tab [1])被赋值tab [i](tab [3])的值。 tab is now [1,2,1,2,2,2,3]. tab现在是[1,2,1,2,2,2,3]。 i is incremented, and is now 4. 我增加了,现在是4。

Iteration 4: tab[i] (tab[4], or 2) is compared to tab[j] (tab[1] or 2). 迭代4:tab [i](tab [4]或2)与tab [j](tab [1]或2)进行比较。 They are equal, so i is incremented. 他们是平等的,所以我增加了。 i is now 5. 我现在5岁。

Iteration 5: tab[i] (tab[5], or 2) is compared to tab[j] (tab[1] or 2). 迭代5:tab [i](tab [5]或2)与tab [j](tab [1]或2)进行比较。 They are equal, so i is incremented. 他们是平等的,所以我增加了。 i is now 6. 我现在6岁。

Iteration 6: tab[i] (tab[6], or 3) is compared to tab[j] (tab[1] or 2). 迭代6:tab [i](tab [6]或3)与tab [j](tab [1]或2)进行比较。 They are not equal. 他们不平等。 j is incremented, and is now 2. tab[j] (tab[2]) is assigned the value of tab[i] (tab[6]). j递增,现在为2. tab [j](tab [2])被赋值tab [i](tab [6])的值。 tab is now [1,2,3,2,2,2,3]. 选项卡现在是[1,2,3,2,2,2,3]。 i is incremented, and is now 7. 我增加了,现在是7。

I is now no longer less than the length of tab, we exit the while loop. 我现在不再小于tab的长度,我们退出while循环。

int[] B = Arrays.copyOf(tab, j + 1);
return B;

B is created by copying tab up to length j + 1, or 3, starting from the first element. 通过从第一个元素开始将tab复制到长度j + 1或3来创建B. B is now [1,2,3]. B现在是[1,2,3]。

Method returns [1,2,3], as expected. 方法按预期返回[1,2,3]。

It is obvious that with the line 很明显,这条线

if (tab[i] == tab[j])

you can skip an element of the array if two consecutive elements are the same. 如果两个连续的元素相同,则可以跳过数组的元素。

There are a few important considerations for this duplicate removal portion of the code. 对于此代码的重复删除部分,有一些重要的注意事项。

  • It is actually overwriting the contents of the array in the middle of determining duplicates. 它实际上是在确定重复的过程中覆盖数组的内容。 It's using the front part of the array (which has already been checked previously) to store the contents of the non-duplicates. 它使用数组的前部(先前已经检查过)来存储非重复项的内容。 Since Java passes objects by value (specifically value of reference), changes to the local copy of the parameter won't be propagated back in the calling method. 由于Java按值传递对象(特别是引用值),因此对参数的本地副本的更改不会在调用方法中传播回来。 This is not true for languages that pass objects by reference (like C#), so this algorithm would not work there. 对于通过引用传递对象的语言(如C#),情况并非如此,因此该算法无法在那里工作。

  • This algorithm depends on the fact that sorting the array puts all of the same duplicate into a single continuous block of indexes. 此算法取决于对数组进行排序的事实将所有相同的副本放入单个连续的索引块中。 This guarantees that all future elements will never match something that was previously overwritten many iterations ago (because it is at least the size of the current maximum, else the array wasn't actually sorted properly). 这可以保证所有未来的元素永远不会匹配以前在多次迭代之前被覆盖的东西(因为它至少是当前最大值的大小,否则数组实际上没有正确排序)。

Heavily commented explanation 大力评论解释

public static int[] utenDubletter(int[] tab){
    //sort the array
    Sortering.sorterHeltallstabell(tab);

    //There must be at least 2 elements for any duplicate to exist
    if (tab.length < 2)
        return tab;

    int j = 0; //index of the largest element in the new array found so far
    int i = 1; //index of the current index of the array being checked

    while (i < tab.length) {
        //if it's a duplicate 
        if (tab[i] == tab[j]) {
            i++;    //just skip this element and check the next one
        } else {
            j++;    //since this number does not exist in the new array make space for it 
            tab[j] = tab[i];    //record this new element
                                //we have checked this element (with i) before this 
                                //so we don't need to keep it around any longer
            i++;    //move onto the next element 
        }           
    }              
    int[] B = Arrays.copyOf(tab, j + 1);    //copy only the elements that we actually 
                                            //manually overwrote. Since arrays are
                                            //0-indexed, add one to the final index (j)
                                            //for the number of elements in our new array.
    return B;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM