[英]Remove duplicates from a large unsorted array and maintain the order
I have an unsorted array of integers where the value is ranging from Integer.MIN_VALUE to Integer.MAX_VALUE. 我有一个未排序的整数数组,其值范围从Integer.MIN_VALUE到Integer.MAX_VALUE。 There can be multiple duplicates of any integer in the array. 数组中可以有多个重复的任何整数。 I need to return an array with all duplicates removed and also maintain the order of elements. 我需要返回一个删除了所有重复项的数组,并保持元素的顺序。
example: 例:
int[] input = {7,8,7,1,9,0,9,1,2,8}
output should be {7,8,1,9,0,2} 输出应为{7,8,1,9,0,2}
I know this problem can be solved using LinkedHashSet
but I need a solution which doesn't involve significant buffer space. 我知道这个问题可以使用LinkedHashSet
解决,但我需要一个不涉及大量缓冲区空间的解决方案。
You can use java 8 Arrays stream.distinct()
method to get distinct values from array and it will remain the input order only 您可以使用java 8 Arrays stream.distinct()
方法从数组中获取不同的值,它将仅保留输入顺序
public static void main(String[] args) {
int[] input = {7,8,7,1,9,0,9,1,2,8};
int[] output = Arrays.stream(input).distinct().toArray();
System.out.println(Arrays.toString(output)); //[7, 8, 1, 9, 0, 2]
}
One clever approach is to use a LinkedHashSet
to represent the input array. 一种聪明的方法是使用LinkedHashSet
来表示输入数组。 A LinkedHashSet
has the properties that it maintains insertion order (linked list behavior), but is also will ignore the same key being inserted again (map behavior). LinkedHashSet
具有维护插入顺序(链表行为)的属性,但也会忽略再次插入的相同键(映射行为)。 This means that, for example, the value 7
will be only be inserted into the list/map once, the first time it occurs. 这意味着,例如,值7
将仅在第一次出现时插入到列表/映射中一次。 This is the behavior we want. 这是我们想要的行为。
LinkedHashSet<Integer> lhs = new LinkedHashSet<>();
int[] input = new int[] {7, 8, 7, 1, 9, 0, 9, 1, 2, 8};
for (int val : input) lhs.add(val);
int[] output = new int[lhs.size()];
int i = 0;
for (Integer val : lhs) {
output[i++] = val;
}
System.out.println(Arrays.toString(output));
[7, 8, 1, 9, 0, 2]
What about do it straightforwardly like this 怎么这么直截了当地做到这一点
int[] input = {7,8,7,1,9,0,9,1,2,8};
ArrayList<Integer> list = new ArrayList<>();
for(int i = 0;i<input.length;i++)
{
if(!list.contains(input[i]))
{
list.add(i);
}
}
int[] output = new int[list.size()];
for(int i = 0;i<list.size();i++)
{
output[i] = list.get(i);
}
You could use a HashSet
instead of a LinkedHashSet
, but this still uses a buffer: 您可以使用HashSet
而不是LinkedHashSet
,但这仍然使用缓冲区:
private static int[] noDups(int[] arr) {
Set<Integer> set = new HashSet<>();
int nextIndex = 0;
for (int i = 0; i < arr.length; ++i) {
if (set.add(arr[i])) {
arr[nextIndex++] = arr[i];
}
}
return Arrays.copyOfRange(arr, 0, nextIndex);
}
Just notice that this alters the original array, if you don't want that another call to create a copy would be needed: int[] copy = Arrays.copyOfRange(arr, 0, arr.length);
请注意,这会改变原始数组,如果您不希望需要另一个调用来创建副本: int[] copy = Arrays.copyOfRange(arr, 0, arr.length);
Otherwise if you want to do it in place, without any additional buffer, you would have to iterate the array in a nested array via and the complexity would become: O(n*n)
, which is pretty bad. 否则,如果你想在没有任何额外缓冲区的情况下进行,你必须在嵌套数组中迭代数组,复杂性将变为: O(n*n)
,这非常糟糕。
I tried couple of approaches - using a Map
and a Set
to arrive at distinct elements. 我尝试了几种方法 - 使用Map
和Set
来获得不同的元素。 As per the requirement these do not use LinkedHashSet
or LinkedHashMap
. 根据要求,这些不使用LinkedHashSet
或LinkedHashMap
。 Also, the order is preserved. 此外,订单仍然保留。
I used this sample input array in both cases and got the expected output. 我在两种情况下都使用了这个示例输入数组并得到了预期的输出。
INPUT: int [] arr = new int [] {1, 99, 2, 82, 99, -20, 9, 2, 9, 45, -319, 1};
INPUT: int [] arr = new int [] {1, 99, 2, 82, 99, -20, 9, 2, 9, 45, -319, 1};
RESULT: [1, 99, 2, 82, -20, 9, 45, -319]
结果: [1, 99, 2, 82, -20, 9, 45, -319]
The code samples: 代码示例:
Using a Map : 使用地图 :
int [] arrayWithDistictElements = new int [arr.length];
int noOfDistinctElements = 0;
HashMap<Integer, Integer> map = new HashMap<>();
for (int i = 0, j = 0; i < arr.length; i++) {
if (map.put(arr [i], 1) == null) {
arrayWithDistictElements [j++] = arr [i];
++noOfDistinctElements;
}
}
int [] result = Arrays.copyOf(arrayWithDistictElements, noOfDistinctElements);
Using a Set : 使用集合 :
Set<Integer> set = new HashSet<>();
int [] arrayWithDistictElements = new int [arr.length];
int noOfDistinctElements = 0;
for (int i = 0, j = 0; i < arr.length; i++) {
if (set.add(arr [i])) {
arrayWithDistictElements [j++] = arr [i];
++noOfDistinctElements;
}
}
int [] result = Arrays.copyOf(arrayWithDistictElements, noOfDistinctElements);
The Code : 守则 :
Here is the complete code of my tests I had tried and some results: 以下是我尝试过的完整代码和一些结果:
import java.util.*;
import java.util.stream.*;
import java.time.*;
import java.text.*;
public class UniqueArrayTester {
private final static int ARRAY_SIZE = 10_000_000;
private static Random r = new Random();
public static void main(String [] args) {
DecimalFormat formatter = new DecimalFormat("###,###,###,###");
System.out.println("Input array size: " + formatter.format(ARRAY_SIZE));
for (int i = 0; i < 5; i++) {
// For testing with a small input and print the result use this as input:
//int [] arr = new int [] {1, 99, 2, 82, 99, -20, 9, 2, 9, 45, -319, 1};
//System.out.println(Arrays.toString(arr));
System.out.println("[Test " + Integer.toString(i+1) + "]");
int [] arr = getArray();
process1(arr);
process2(arr);
process3(arr);
}
}
private static int [] getArray() {
return IntStream.generate(() -> r.nextInt())
.limit(ARRAY_SIZE)
.toArray();
}
/*
* Process uses Stream API.
*/
private static void process1(int [] arr) {
LocalTime time1 = LocalTime.now();
int [] result = IntStream.of(arr).distinct().toArray();
LocalTime time2 = LocalTime.now();
System.out.println("Process 1 (using streams) out array size: " + result.length);
System.out.println(" Duration in millis: " + Duration.between(time1, time2).toMillis());
//System.out.println(Arrays.toString(result));
}
/*
* Process uses a Map to arrive at distinct elements.
*/
private static void process2(int [] arr) {
LocalTime time1 = LocalTime.now();
int [] arrayWithDistictElements = new int [arr.length];
int noOfDistinctElements = 0;
HashMap<Integer, Integer> map = new HashMap<>();
for (int i = 0, j = 0; i < arr.length; i++) {
if (map.put(arr [i], 1) == null) {
arrayWithDistictElements [j++] = arr [i];
++noOfDistinctElements;
}
}
int [] result = Arrays.copyOf(arrayWithDistictElements, noOfDistinctElements);
LocalTime time2 = LocalTime.now();
System.out.println("Process 2 (using map) out array size: " + result.length);
System.out.println(" Duration in millis: " + Duration.between(time1, time2).toMillis());
//System.out.println(Arrays.toString(result));
}
/*
* Process uses a Set to arrive at distinct elements.
*/
private static void process3(int [] arr) {
LocalTime time1 = LocalTime.now();
Set<Integer> set = new HashSet<>();
int [] arrayWithDistictElements = new int [arr.length];
int noOfDistinctElements = 0;
for (int i = 0, j = 0; i < arr.length; i++) {
if (set.add(arr [i])) {
arrayWithDistictElements [j++] = arr [i];
++noOfDistinctElements;
}
}
int [] result = Arrays.copyOf(arrayWithDistictElements, noOfDistinctElements);
LocalTime time2 = LocalTime.now();
System.out.println("Process 3 (using set) out array size: " + result.length);
System.out.println(" Duration in millis: " + Duration.between(time1, time2).toMillis());
//System.out.println(Arrays.toString(result));
}
}
Test Results : 测试结果 :
The equipment used: Intel CORE i3 processor, Windows 7 64 bit, Java 8 使用的设备:Intel CORE i3处理器,Windows 7 64位,Java 8
Input array size: 10,000,000
[Test 1]
Process 1 (using streams) out array size: 9988498
Duration in millis: 10649
Process 2 (using map) out array size: 9988498
Duration in millis: 10294
Process 3 (using set) out array size: 9988498
Duration in millis: 8982
[Test 2]
Process 1 (using streams) out array size: 9988331
Duration in millis: 7839
Process 2 (using map) out array size: 9988331
Duration in millis: 5567
Process 3 (using set) out array size: 9988331
Duration in millis: 4155
[Test 3]
Process 1 (using streams) out array size: 9988286
Duration in millis: 9138
Process 2 (using map) out array size: 9988286
Duration in millis: 6799
Process 3 (using set) out array size: 9988286
Duration in millis: 7155
[Test 4]
Process 1 (using streams) out array size: 9988431
Duration in millis: 7908
Process 2 (using map) out array size: 9988431
Duration in millis: 6909
Process 3 (using set) out array size: 9988431
Duration in millis: 7205
[Test 5]
Process 1 (using streams) out array size: 9988334
Duration in millis: 7971
Process 2 (using map) out array size: 9988334
Duration in millis: 6910
Process 3 (using set) out array size: 9988334
Duration in millis: 7196
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.