简体   繁体   English

删除重复项而不覆盖哈希方法

[英]Removing duplicates without overriding hash method

I have a List which contains a list of objects and I want to remove from this list all the elements which have the same values in two of their attributes. 我有一个List,其中包含一个对象列表,我想从这个列表中删除所有在两个属性中具有相同值的元素。 I had though about doing something like this: 我曾做过这样的事情:

List<Class1> myList;
....
Set<Class1> mySet = new HashSet<Class1>();
mySet.addAll(myList);

and overriding hash method in Class1 so it returns a number which depends only in the attributes I want to consider. 并重写Class1中的哈希方法,因此它返回一个仅取决于我想要考虑的属性的数字。

The problem is that I need to do a different filtering in another part of the application so I can't override hash method in this way (I would need two different hash methods). 问题是我需要在应用程序的另一部分进行不同的过滤,所以我不能以这种方式覆盖哈希方法(我需要两种不同的哈希方法)。

What's the most efficient way of doing this filtering without overriding hash method? 在不重写哈希方法的情况下进行此过滤的最有效方法是什么?

Thanks 谢谢

Overriding hashCode and equals in Class1 (just to do this) is problematic. Class1重写hashCodeequals (只是为了做到这一点)是有问题的。 You end up with your class having an unnatural definition of equality, which may turn out to be other for other current and future uses of the class. 你最终会对你的班级有一个不自然的平等定义,这可能会成为班级其他当前和未来用途的其他用途。

Review the Comparator interface and write a Comparator<Class1> implementation to compare instances of your Class1 based on your criteria; 查看Comparator接口并编写Comparator<Class1>实现,以根据您的标准比较Class1的实例; eg based on those two attributes. 例如,基于这两个属性。 Then instantiate a TreeSet<Class >` for duplicate detection using the TreeSet(Comparator) constructor. 然后使用TreeSet(Comparator)构造函数实例化TreeSet<Class >`以进行重复检测。

EDIT 编辑

Comparing this approach with @Tom Hawtin's approach: 将这种方法与@Tom Hawtin的方法进行比较:

  • The two approaches use roughly comparable space overall. 这两种方法总体上使用大致相当的空间。 The treeset's internal nodes roughly balance the hashset's array and the wrappers that support the custom equals / hash methods. treeset的内部节点大致平衡了hashset的数组和支持自定义equals / hash方法的包装器。

  • The wrapper + hashset approach is O(N) in time (assuming good hashing) versus O(NlogN) for the treeset approach. 对于树集方法,包装器+哈希集方法在时间上是O(N) (假设是良好的哈希)而不是O(NlogN) So that is the way to go if the input list is likely to be large. 因此,如果输入列表可能很大,那就是要走的路。

  • The treeset approach wins in terms of the lines of code that need to be written. 树形集方法在需要编写的代码行方面获胜。

Let your Class1 implements Comparable . 让你的Class1实现Comparable Then use TreeSet as in your example (ie use addAll method). 然后在您的示例中使用TreeSet (即使用addAll方法)。

As an alternative to what Roman said you can have a look at this SO question about filtering using Predicates. 作为Roman所说的替代方法,您可以查看有关使用Predicates进行过滤的SO问题 If you use Google Collections anyway this might be a good fit. 无论如何,如果你使用谷歌收藏,这可能是一个不错的选择。

I would suggest introducing a class for the concept of the parts of Class1 that you want to consider significant in this context. 我建议为Class1的部分概念引入一个类,你想在这个上下文中考虑重要。 Then use a HashSet or HashMap . 然后使用HashSetHashMap

Sometimes programmers make things too complicated trying to use all the nice features of a language, and the answers to this question are an example. 有时程序员会尝试使用语言的所有优秀功能而使事情过于复杂,而这个问题的答案就是一个例子。 Overriding anything on the class is overkill. 覆盖课堂上的任何内容都是过度的。 What you need is this: 你需要的是这个:

class MyClass {
  Object attr1;
  Object attr2;
}

List<Class1> list;
Set<Class1> set=....
Set<MyClass> tempset = new HashSet<MyClass>;

for (Class1 c:list) {
  MyClass myc = new MyClass();
  myc.attr1 = c.attr1;
  myc.attr2 = c.attr2;

  if (!tempset.contains(myc)) {
    tempset.add(myc);
    set.add(c);
  }
}

Feel free to fix up minor irregulairites. 随意修复轻微的irregulairites。 There will be some issues depending on what you mean by equality for the attributes (and obvious changes if the attributes are primitive). 根据属性的相等性意味着一些问题(如果属性是原始的,则会有明显的变化)。 Sometimes we need to write code, not just use the builtin libraries. 有时我们需要编写代码,而不仅仅是使用内置库。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM