简体   繁体   中英

Remove duplicates from a large unsorted array and maintain the order

I have an unsorted array of integers where the value is ranging from Integer.MIN_VALUE to Integer.MAX_VALUE. There can be multiple duplicates of any integer in the array. I need to return an array with all duplicates removed and also maintain the order of elements.

example:

int[] input = {7,8,7,1,9,0,9,1,2,8}

output should be {7,8,1,9,0,2}

I know this problem can be solved using LinkedHashSet but I need a solution which doesn't involve significant buffer space.

You can use java 8 Arrays stream.distinct() method to get distinct values from array and it will remain the input order only

public static void main(String[] args) {
    int[] input = {7,8,7,1,9,0,9,1,2,8};
    int[] output = Arrays.stream(input).distinct().toArray();
    System.out.println(Arrays.toString(output)); //[7, 8, 1, 9, 0, 2]
}

One clever approach is to use a LinkedHashSet to represent the input array. A LinkedHashSet has the properties that it maintains insertion order (linked list behavior), but is also will ignore the same key being inserted again (map behavior). This means that, for example, the value 7 will be only be inserted into the list/map once, the first time it occurs. This is the behavior we want.

LinkedHashSet<Integer> lhs = new LinkedHashSet<>();
int[] input = new int[] {7, 8, 7, 1, 9, 0, 9, 1, 2, 8};
for (int val : input) lhs.add(val);
int[] output = new int[lhs.size()];
int i = 0;
for (Integer val : lhs) {
    output[i++] = val;
}
System.out.println(Arrays.toString(output));

[7, 8, 1, 9, 0, 2]

Demo

What about do it straightforwardly like this

    int[] input =  {7,8,7,1,9,0,9,1,2,8};
    ArrayList<Integer> list = new ArrayList<>();
    for(int i = 0;i<input.length;i++)
    {
        if(!list.contains(input[i]))
        {
            list.add(i);
        }
    }
    int[] output = new int[list.size()];
    for(int i = 0;i<list.size();i++)
    {
        output[i] = list.get(i);
    }

You could use a HashSet instead of a LinkedHashSet , but this still uses a buffer:

 private static int[] noDups(int[] arr) {
    Set<Integer> set = new HashSet<>();
    int nextIndex = 0;
    for (int i = 0; i < arr.length; ++i) {
        if (set.add(arr[i])) {
            arr[nextIndex++] = arr[i];
        }

    }

    return Arrays.copyOfRange(arr, 0, nextIndex);
}

Just notice that this alters the original array, if you don't want that another call to create a copy would be needed: int[] copy = Arrays.copyOfRange(arr, 0, arr.length);

Otherwise if you want to do it in place, without any additional buffer, you would have to iterate the array in a nested array via and the complexity would become: O(n*n) , which is pretty bad.

I tried couple of approaches - using a Map and a Set to arrive at distinct elements. As per the requirement these do not use LinkedHashSet or LinkedHashMap . Also, the order is preserved.

I used this sample input array in both cases and got the expected output.
INPUT: int [] arr = new int [] {1, 99, 2, 82, 99, -20, 9, 2, 9, 45, -319, 1};
RESULT: [1, 99, 2, 82, -20, 9, 45, -319]

The code samples:

Using a Map :

int [] arrayWithDistictElements = new int [arr.length];
int noOfDistinctElements = 0;
HashMap<Integer, Integer> map = new HashMap<>();

for (int i = 0, j = 0; i < arr.length; i++) {

    if (map.put(arr [i], 1) == null) {

        arrayWithDistictElements [j++] = arr [i];
        ++noOfDistinctElements;
    }
}

int [] result = Arrays.copyOf(arrayWithDistictElements, noOfDistinctElements);


Using a Set :

Set<Integer> set = new HashSet<>();
int [] arrayWithDistictElements = new int [arr.length];
int noOfDistinctElements = 0;

for (int i = 0, j = 0; i < arr.length; i++) {

    if (set.add(arr [i])) {

        arrayWithDistictElements [j++] = arr [i];
        ++noOfDistinctElements;
    }
}

int [] result = Arrays.copyOf(arrayWithDistictElements, noOfDistinctElements);


The Code :

Here is the complete code of my tests I had tried and some results:

import java.util.*;
import java.util.stream.*;
import java.time.*;
import java.text.*;
public class UniqueArrayTester {

    private final static int ARRAY_SIZE = 10_000_000;
    private static Random r = new Random();

    public static void main(String [] args) {

        DecimalFormat formatter = new DecimalFormat("###,###,###,###");
        System.out.println("Input array size: " + formatter.format(ARRAY_SIZE));

        for (int i = 0; i < 5; i++) {

            // For testing with a small input and print the result use this as input:
            //int [] arr = new int [] {1, 99, 2, 82, 99, -20, 9, 2, 9, 45, -319, 1};
            //System.out.println(Arrays.toString(arr));

            System.out.println("[Test " + Integer.toString(i+1) + "]");
            int [] arr = getArray();
            process1(arr);
            process2(arr);
            process3(arr);
        }
    }

    private static int [] getArray() {  
        return IntStream.generate(() -> r.nextInt())
                        .limit(ARRAY_SIZE)
                        .toArray();
    }

    /*
     * Process uses Stream API.
     */
    private static void process1(int [] arr) {
        LocalTime time1 = LocalTime.now();
        int [] result = IntStream.of(arr).distinct().toArray();
        LocalTime time2 = LocalTime.now();
        System.out.println("Process 1 (using streams) out array size: " + result.length);
        System.out.println("    Duration in millis: " + Duration.between(time1, time2).toMillis());
        //System.out.println(Arrays.toString(result));
    }

    /*
     * Process uses a Map to arrive at distinct elements.
     */ 
    private static void process2(int [] arr) {
        LocalTime time1 = LocalTime.now();
        int [] arrayWithDistictElements = new int [arr.length];
        int noOfDistinctElements = 0;
        HashMap<Integer, Integer> map = new HashMap<>();
        for (int i = 0, j = 0; i < arr.length; i++) {
            if (map.put(arr [i], 1) == null) {
                arrayWithDistictElements [j++] = arr [i];
                ++noOfDistinctElements;
            }
        }
        int [] result = Arrays.copyOf(arrayWithDistictElements, noOfDistinctElements);
        LocalTime time2 = LocalTime.now();
        System.out.println("Process 2 (using map) out array size: " + result.length);
        System.out.println("    Duration in millis: " + Duration.between(time1, time2).toMillis());
        //System.out.println(Arrays.toString(result));
    }

    /*
     * Process uses a Set to arrive at distinct elements.
     */ 
    private static void process3(int [] arr) {
        LocalTime time1 = LocalTime.now();
        Set<Integer> set = new HashSet<>();
        int [] arrayWithDistictElements = new int [arr.length];
        int noOfDistinctElements = 0;
        for (int i = 0, j = 0; i < arr.length; i++) {
            if (set.add(arr [i])) {
                arrayWithDistictElements [j++] = arr [i];
                ++noOfDistinctElements;
            }
        }
        int [] result = Arrays.copyOf(arrayWithDistictElements, noOfDistinctElements);
        LocalTime time2 = LocalTime.now();
        System.out.println("Process 3 (using set) out array size: " + result.length);
        System.out.println("    Duration in millis: " + Duration.between(time1, time2).toMillis());
        //System.out.println(Arrays.toString(result));
    }
}


Test Results :

The equipment used: Intel CORE i3 processor, Windows 7 64 bit, Java 8

Input array size: 10,000,000
[Test 1]
Process 1 (using streams) out array size: 9988498
    Duration in millis: 10649
Process 2 (using map) out array size: 9988498
    Duration in millis: 10294
Process 3 (using set) out array size: 9988498
    Duration in millis: 8982
[Test 2]
Process 1 (using streams) out array size: 9988331
    Duration in millis: 7839
Process 2 (using map) out array size: 9988331
    Duration in millis: 5567
Process 3 (using set) out array size: 9988331
    Duration in millis: 4155
[Test 3]
Process 1 (using streams) out array size: 9988286
    Duration in millis: 9138
Process 2 (using map) out array size: 9988286
    Duration in millis: 6799
Process 3 (using set) out array size: 9988286
    Duration in millis: 7155
[Test 4]
Process 1 (using streams) out array size: 9988431
    Duration in millis: 7908
Process 2 (using map) out array size: 9988431
    Duration in millis: 6909
Process 3 (using set) out array size: 9988431
    Duration in millis: 7205
[Test 5]
Process 1 (using streams) out array size: 9988334
    Duration in millis: 7971
Process 2 (using map) out array size: 9988334
    Duration in millis: 6910
Process 3 (using set) out array size: 9988334
    Duration in millis: 7196

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM