简体   繁体   English

在Java中删除数组中重复项的最佳方法是什么?

[英]What is the best way to remove duplicates in an Array in Java?

I have an Array of Objects that need the duplicates removed/filtered. 我有一个对象数组需要删除/过滤重复项。 I was going to just override equals & hachCode on the Object elements, and then stick them in a Set... but I figured I should at least poll stackoverflow to see if there was another way, perhaps some clever method of some other API? 我只是在Object元素上覆盖equals&hachCode,然后将它们粘贴在Set中......但我认为我至少应该轮询stackoverflow以查看是否有其他方法,或许某些其他API的聪明方法?

I would agree with your approach to override hashCode() and equals() and use something that implements Set . 我同意你的方法来覆盖hashCode()equals()并使用实现Set东西。

Doing so also makes it absolutely clear to any other developers that the non-duplicate characteristic is required. 这样做也使任何其他开发人员都清楚地知道需要非重复的特性。

Another reason - you get to choose an implementation that meets your needs best now: 另一个原因 - 您现在可以选择最符合您需求的实施方案:

and you don't have to change your code to change the implementation in the future. 并且您不必更改代码以在将来更改实现。

I found this in the web 我在网上发现了这个

Here are two methods that allow you to remove duplicates in an ArrayList. 以下两种方法允许您删除ArrayList中的重复项。 removeDuplicate does not maintain the order where as removeDuplicateWithOrder maintains the order with some performance overhead. removeDuplicate不维护removeDuplicateWithOrder维护订单的顺序,其中包含一些性能开销。

  1. The removeDuplicate Method: removeDuplicate方法:

     /** List order not maintained **/ public static void removeDuplicate(ArrayList arlList) { HashSet h = new HashSet(arlList); arlList.clear(); arlList.addAll(h); } 
  2. The removeDuplicateWithOrder Method: removeDuplicateWithOrder方法:

     /** List order maintained **/ public static void removeDuplicateWithOrder(ArrayList arlList) { Set set = new HashSet(); List newList = new ArrayList(); for (Iterator iter = arlList.iterator(); iter.hasNext();) { Object element = iter.next(); if (set.add(element)) newList.add(element); } arlList.clear(); arlList.addAll(newList); } 

Overriding equals and hashCode and creating a set was my first thought too. 覆盖equalshashCode并创建一个集合也是我的第一个想法。 It's good practice to have some overridden version of these methods anyway in your inheritance hierarchy. 在继承层次结构中,无论如何都要对这些方法进行一些重写,这是一种很好的做法。

I think that if you use a LinkedHashSet you'll even preserve order of unique elements... 认为如果你使用LinkedHashSet你甚至可以保留独特元素的顺序......

Use a List distinctList to record element at the first time iterator stumble into it, returns the distinctList as list removed all duplicates 使用List distinctList在第一次iterator偶然发现时记录元素,返回distinctList作为列表删除所有重复项


 private List removeDups(List list) {
        Set tempSet = new HashSet();
        List distinctList = new ArrayList();
        for(Iterator  it = list.iterator(); it.hasNext();) {
            Object next = it.next();
            if(tempSet.add(next)) {
                distinctList.add(next);
            } 
        }
        return distinctList;
   } 

Basically, you want a LinkedHashSet<T> implementation that supports the List<T> interface for random access. 基本上,您需要一个LinkedHashSet<T>实现,它支持List<T>接口以进行随机访问。 Hence, this is what you need: 因此,这就是你需要的:

public class LinkedHashSetList<T> extends LinkedHashSet<T> implements List<T> {

// Implementations for List<T> methods here ... // Implementations for List<T> methods here ...

}

The implementation of the List<T> methods would access and manipulate the underlying LinkedHashSet<T> . List<T>方法的实现将访问和操作底层的LinkedHashSet<T> The trick is to have this class behave correctly when one attempts to add duplicates via the List<T> add methods (throwing an exception or re-adding the item at a different index would be options: which you can either choose one of or make configurable by users of the class). 当一个人试图通过List<T>添加方法添加重复项时(抛出异常或在不同的索引处重新添加项目)将是选项:你可以选择其中一个或者make可由班级用户配置)。

I'd like to reiterate the point made by Jason in the comments: 我想重申杰森在评论中提出的观点:

Why place yourself at that point at all? 为什么要把自己放在那一点上?

Why use an array for a data structure that shouldn't hold duplicates at all? 为什么要将数组用于不应该重复的数据结构?

Use a Set or a SortedSet (when the elements have a natural order as well) at all times to hold the elements. 使用SetSortedSet (当元素具有自然顺序时)始终保持元素。 If you need to keep the insertion order, then you can use the LinkedHashSet as it has been pointed out. 如果您需要保持插入顺序,那么您可以使用已指出的LinkedHashSet

Having to post-process some data structure is often a hint that you should have choosen a different one to begin with. 必须对某些数据结构进行后期处理通常是一种暗示,您应该首先选择不同的数据结构。

Of course the original post begs the question, "How did you get that array (that might contain duplicated entries) in the first place?" 当然,最初的帖子引出了一个问题,“你是如何获得那个阵列(可能包含重复的条目)?”

Do you need the array (with duplicates) for other purposes, or could you simply use a Set from the beginning? 您是否需要将阵列(带有重复项)用于其他目的,或者您是否可以从头开始使用Set?

Alternately, if you need to know the number of occurrences of each value, you could use a Map<CustomObject, Integer> to track counts. 或者,如果您需要知道每个值的出现次数,可以使用Map<CustomObject, Integer>来跟踪计数。 Also, the Google Collections definition of the Multimap classes may be of use. 此外,Multimap类的Google Collections定义可能有用。

A Set is definitely your best bet. Set肯定是你最好的选择。 The only way to remove things from an array (without creating a new one) is to null them out, and then you end up with a lot of null-checks later. 从数组中删除东西(不创建新数组)的唯一方法是将它们清空,然后最后进行大量的空检查。

Speaking from a general programming standard you could always double enumerate the collections then the compare the source and target. 从通用编程标准来看,您可以始终双重枚举集合,然后比较源和目标。

And if your inner enumeration always starts one entry after the source, it's fairly efficient (pseudo code to follow) 如果你的内部枚举总是在源之后开始一个条目,那么它是相当有效的(伪代码可以遵循)

foreach ( array as source )
{
    // keep track where we are in the array
    place++;
    // loop the array starting at the entry AFTER the current one we are comparing to
    for ( i=place+1; i < max(array); i++ )
    {
        if ( source === array[place] )
        {
            destroy(array[i]);
        }
    }
}

You could arguably add a break; 你可以说可以加一个休息时间; statement after the destroy but then you only discover the first duplicate, but if that's all you will ever have, then it would be a nice small optimization. 在销毁之后的陈述,但是你只发现了第一个副本,但如果这就是你将拥有的所有,那么这将是一个不错的小优化。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM