简体   繁体   English

从大型未排序的数组中删除重复项并维护顺序

[英]Remove duplicates from a large unsorted array and maintain the order

I have an unsorted array of integers where the value is ranging from Integer.MIN_VALUE to Integer.MAX_VALUE. 我有一个未排序的整数数组,其值范围从Integer.MIN_VALUE到Integer.MAX_VALUE。 There can be multiple duplicates of any integer in the array. 数组中可以有多个重复的任何整数。 I need to return an array with all duplicates removed and also maintain the order of elements. 我需要返回一个删除了所有重复项的数组,并保持元素的顺序。

example: 例:

int[] input = {7,8,7,1,9,0,9,1,2,8}

output should be {7,8,1,9,0,2} 输出应为{7,8,1,9,0,2}

I know this problem can be solved using LinkedHashSet but I need a solution which doesn't involve significant buffer space. 我知道这个问题可以使用LinkedHashSet解决,但我需要一个不涉及大量缓冲区空间的解决方案。

You can use java 8 Arrays stream.distinct() method to get distinct values from array and it will remain the input order only 您可以使用java 8 Arrays stream.distinct()方法从数组中获取不同的值,它将仅保留输入顺序

public static void main(String[] args) {
    int[] input = {7,8,7,1,9,0,9,1,2,8};
    int[] output = Arrays.stream(input).distinct().toArray();
    System.out.println(Arrays.toString(output)); //[7, 8, 1, 9, 0, 2]
}

One clever approach is to use a LinkedHashSet to represent the input array. 一种聪明的方法是使用LinkedHashSet来表示输入数组。 A LinkedHashSet has the properties that it maintains insertion order (linked list behavior), but is also will ignore the same key being inserted again (map behavior). LinkedHashSet具有维护插入顺序(链表行为)的属性,但也会忽略再次插入的相同键(映射行为)。 This means that, for example, the value 7 will be only be inserted into the list/map once, the first time it occurs. 这意味着,例如,值7将仅在第一次出现时插入到列表/映射中一次。 This is the behavior we want. 这是我们想要的行为。

LinkedHashSet<Integer> lhs = new LinkedHashSet<>();
int[] input = new int[] {7, 8, 7, 1, 9, 0, 9, 1, 2, 8};
for (int val : input) lhs.add(val);
int[] output = new int[lhs.size()];
int i = 0;
for (Integer val : lhs) {
    output[i++] = val;
}
System.out.println(Arrays.toString(output));

[7, 8, 1, 9, 0, 2]

Demo 演示

What about do it straightforwardly like this 怎么这么直截了当地做到这一点

    int[] input =  {7,8,7,1,9,0,9,1,2,8};
    ArrayList<Integer> list = new ArrayList<>();
    for(int i = 0;i<input.length;i++)
    {
        if(!list.contains(input[i]))
        {
            list.add(i);
        }
    }
    int[] output = new int[list.size()];
    for(int i = 0;i<list.size();i++)
    {
        output[i] = list.get(i);
    }

You could use a HashSet instead of a LinkedHashSet , but this still uses a buffer: 您可以使用HashSet而不是LinkedHashSet ,但这仍然使用缓冲区:

 private static int[] noDups(int[] arr) {
    Set<Integer> set = new HashSet<>();
    int nextIndex = 0;
    for (int i = 0; i < arr.length; ++i) {
        if (set.add(arr[i])) {
            arr[nextIndex++] = arr[i];
        }

    }

    return Arrays.copyOfRange(arr, 0, nextIndex);
}

Just notice that this alters the original array, if you don't want that another call to create a copy would be needed: int[] copy = Arrays.copyOfRange(arr, 0, arr.length); 请注意,这会改变原始数组,如果您不希望需要另一个调用来创建副本: int[] copy = Arrays.copyOfRange(arr, 0, arr.length);

Otherwise if you want to do it in place, without any additional buffer, you would have to iterate the array in a nested array via and the complexity would become: O(n*n) , which is pretty bad. 否则,如果你想在没有任何额外缓冲区的情况下进行,你必须在嵌套数组中迭代数组,复杂性将变为: O(n*n) ,这非常糟糕。

I tried couple of approaches - using a Map and a Set to arrive at distinct elements. 我尝试了几种方法 - 使用MapSet来获得不同的元素。 As per the requirement these do not use LinkedHashSet or LinkedHashMap . 根据要求,这些使用LinkedHashSetLinkedHashMap Also, the order is preserved. 此外,订单仍然保留。

I used this sample input array in both cases and got the expected output. 我在两种情况下都使用了这个示例输入数组并得到了预期的输出。
INPUT: int [] arr = new int [] {1, 99, 2, 82, 99, -20, 9, 2, 9, 45, -319, 1}; INPUT: int [] arr = new int [] {1, 99, 2, 82, 99, -20, 9, 2, 9, 45, -319, 1};
RESULT: [1, 99, 2, 82, -20, 9, 45, -319] 结果: [1, 99, 2, 82, -20, 9, 45, -319]

The code samples: 代码示例:

Using a Map : 使用地图

int [] arrayWithDistictElements = new int [arr.length];
int noOfDistinctElements = 0;
HashMap<Integer, Integer> map = new HashMap<>();

for (int i = 0, j = 0; i < arr.length; i++) {

    if (map.put(arr [i], 1) == null) {

        arrayWithDistictElements [j++] = arr [i];
        ++noOfDistinctElements;
    }
}

int [] result = Arrays.copyOf(arrayWithDistictElements, noOfDistinctElements);


Using a Set : 使用集合

Set<Integer> set = new HashSet<>();
int [] arrayWithDistictElements = new int [arr.length];
int noOfDistinctElements = 0;

for (int i = 0, j = 0; i < arr.length; i++) {

    if (set.add(arr [i])) {

        arrayWithDistictElements [j++] = arr [i];
        ++noOfDistinctElements;
    }
}

int [] result = Arrays.copyOf(arrayWithDistictElements, noOfDistinctElements);


The Code : 守则

Here is the complete code of my tests I had tried and some results: 以下是我尝试过的完整代码和一些结果:

import java.util.*;
import java.util.stream.*;
import java.time.*;
import java.text.*;
public class UniqueArrayTester {

    private final static int ARRAY_SIZE = 10_000_000;
    private static Random r = new Random();

    public static void main(String [] args) {

        DecimalFormat formatter = new DecimalFormat("###,###,###,###");
        System.out.println("Input array size: " + formatter.format(ARRAY_SIZE));

        for (int i = 0; i < 5; i++) {

            // For testing with a small input and print the result use this as input:
            //int [] arr = new int [] {1, 99, 2, 82, 99, -20, 9, 2, 9, 45, -319, 1};
            //System.out.println(Arrays.toString(arr));

            System.out.println("[Test " + Integer.toString(i+1) + "]");
            int [] arr = getArray();
            process1(arr);
            process2(arr);
            process3(arr);
        }
    }

    private static int [] getArray() {  
        return IntStream.generate(() -> r.nextInt())
                        .limit(ARRAY_SIZE)
                        .toArray();
    }

    /*
     * Process uses Stream API.
     */
    private static void process1(int [] arr) {
        LocalTime time1 = LocalTime.now();
        int [] result = IntStream.of(arr).distinct().toArray();
        LocalTime time2 = LocalTime.now();
        System.out.println("Process 1 (using streams) out array size: " + result.length);
        System.out.println("    Duration in millis: " + Duration.between(time1, time2).toMillis());
        //System.out.println(Arrays.toString(result));
    }

    /*
     * Process uses a Map to arrive at distinct elements.
     */ 
    private static void process2(int [] arr) {
        LocalTime time1 = LocalTime.now();
        int [] arrayWithDistictElements = new int [arr.length];
        int noOfDistinctElements = 0;
        HashMap<Integer, Integer> map = new HashMap<>();
        for (int i = 0, j = 0; i < arr.length; i++) {
            if (map.put(arr [i], 1) == null) {
                arrayWithDistictElements [j++] = arr [i];
                ++noOfDistinctElements;
            }
        }
        int [] result = Arrays.copyOf(arrayWithDistictElements, noOfDistinctElements);
        LocalTime time2 = LocalTime.now();
        System.out.println("Process 2 (using map) out array size: " + result.length);
        System.out.println("    Duration in millis: " + Duration.between(time1, time2).toMillis());
        //System.out.println(Arrays.toString(result));
    }

    /*
     * Process uses a Set to arrive at distinct elements.
     */ 
    private static void process3(int [] arr) {
        LocalTime time1 = LocalTime.now();
        Set<Integer> set = new HashSet<>();
        int [] arrayWithDistictElements = new int [arr.length];
        int noOfDistinctElements = 0;
        for (int i = 0, j = 0; i < arr.length; i++) {
            if (set.add(arr [i])) {
                arrayWithDistictElements [j++] = arr [i];
                ++noOfDistinctElements;
            }
        }
        int [] result = Arrays.copyOf(arrayWithDistictElements, noOfDistinctElements);
        LocalTime time2 = LocalTime.now();
        System.out.println("Process 3 (using set) out array size: " + result.length);
        System.out.println("    Duration in millis: " + Duration.between(time1, time2).toMillis());
        //System.out.println(Arrays.toString(result));
    }
}


Test Results : 测试结果

The equipment used: Intel CORE i3 processor, Windows 7 64 bit, Java 8 使用的设备:Intel CORE i3处理器,Windows 7 64位,Java 8

Input array size: 10,000,000
[Test 1]
Process 1 (using streams) out array size: 9988498
    Duration in millis: 10649
Process 2 (using map) out array size: 9988498
    Duration in millis: 10294
Process 3 (using set) out array size: 9988498
    Duration in millis: 8982
[Test 2]
Process 1 (using streams) out array size: 9988331
    Duration in millis: 7839
Process 2 (using map) out array size: 9988331
    Duration in millis: 5567
Process 3 (using set) out array size: 9988331
    Duration in millis: 4155
[Test 3]
Process 1 (using streams) out array size: 9988286
    Duration in millis: 9138
Process 2 (using map) out array size: 9988286
    Duration in millis: 6799
Process 3 (using set) out array size: 9988286
    Duration in millis: 7155
[Test 4]
Process 1 (using streams) out array size: 9988431
    Duration in millis: 7908
Process 2 (using map) out array size: 9988431
    Duration in millis: 6909
Process 3 (using set) out array size: 9988431
    Duration in millis: 7205
[Test 5]
Process 1 (using streams) out array size: 9988334
    Duration in millis: 7971
Process 2 (using map) out array size: 9988334
    Duration in millis: 6910
Process 3 (using set) out array size: 9988334
    Duration in millis: 7196

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM