简体   繁体   English

Java:均衡器? (从对象集合中删除重复项)

[英]Java: Equalator? (removing duplicates from a collection of objects)

I have a bunch of objects of a class Puzzle . 我有一堆类Puzzle I have overridden equals() and hashCode() . 我重写了equals()hashCode() When it comes time to present the solutions to the user, I'd like to filter out all the Puzzles that are "similar" (by the standard I have defined), so the user only sees one of each. 当需要向用户展示解决方案时,我想过滤掉所有“相似”的难题(按照我定义的标准),因此用户只能看到其中的一个。

Similarity is transitive. 相似性是可传递的。

Example: 例:

Result of computations:
A    (similar to A)
B    (similar to C)
C
D

In this case, only A or D and B or C would be presented to the user - but not two similar Puzzles. 在这种情况下,只会向用户显示A或D以及B或C-但不会显示两个类似的拼图。 Two similar puzzles are equally valid. 两个类似的难题同样有效。 It is only important that they are not both shown to the user. 重要的是不要同时向用户显示它们。

To accomplish this, I wanted to use an ADT that prohibits duplicates. 为此,我想使用禁止重复的ADT。 However, I don't want to change the equals() and hashCode() methods to return a value about similarity instead. 但是,我不想更改equals()hashCode()方法来返回有关相似性的值。 Is there some Equalator , like Comparator , that I can use in this case? 在这种情况下是否可以使用某些Equalator (例如Comparator Or is there another way I should be doing this? 还是我应该采取另一种方式?

The class I'm working on is a Puzzle that maintains a grid of letters. 我正在上的课是一个拼图,它保持字母网格。 (Like Scrabble.) If a Puzzle contains the same words, but is in a different orientation, it is considered to be similar. (如拼字游戏。)如果“拼图”包含相同的单词,但方向不同,则认为它是相似的。 So the following to puzzle: 因此,以下内容令人困惑:

                                    (2, 2): A           
                                    (2, 1): C           
                                    (2, 0): T

Would be similar to: 将类似于:

                    (1, 2): A           
                    (1, 1): C           
                    (1, 0): T      

I'd use a wrapper class that overrides equals and hashCode accordingly. 我将使用一个包装器类来相应地覆盖equalshashCode

private static class Wrapper {
    public static final Puzzle puzzle;
    public Wrapper(Puzzle puzzle) { 
        this.puzzle = puzzle; 
    }
    @Override 
    public boolean equals(Object object) {
        // ...
    }
    @Override 
    public int hashCode() {
        // ...
    }
}

and then you wrap all your puzzles, put them in a map, and get them out again… 然后将所有谜题包起来,将它们放在地图中,然后再次将它们取出…

public Collection<Collection<Puzzle>> method(Collection<Puzzles> puzzles) {
    Map<Wrapper,<Collection<Puzzle>> map = new HashMap<Wrapper,<Collection<Puzzle>>();
    for (Puzzle each: puzzles) {
        Wrapper wrapper = new Wrapper(each);
        Collection<Puzzle> coll = map.get(wrapper);
        if (coll == null) map.put(wrapper, coll = new ArrayList<Puzzle>());
        coll.add(puzzle);
    }
    return map.values();
}

Okay you have a way of measuring similarity between objects. 好的,您可以使用一种方法来测量对象之间的相似性。 That means they form a Metric Space . 这意味着它们形成一个度量空间

The question is, is your space also a Euclidean space like normal three dimensional space, or integers or something like that? 问题是,您的空间还是像普通的三维空间还是整数之类的欧几里德空间 If it is, then you could use a binary space partition in however many dimensions you've got. 如果是这样,则可以使用二进制空间分区 ,无论您拥有多少维。

(The question is, basically: is there a homomorphism between your objects and an n-dimensional real number vector? If so, then you can use techniques for measuring closeness of points in n-dimensional space.) (问题基本上是:您的对象和n维实数向量之间是否存在同态?如果是,那么您可以使用技术来测量n维空间中点的紧密度。)

Now, if it's not a euclidean space then you've got a bigger problem. 现在,如果它不是欧几里德空间,那么您将面临更大的问题。 An example of a non-euclidean space that programers might be most familiar with would be the Levenshtein Distance between to strings. 程序员可能最熟悉的非欧几里德空间的一个示例是字符串之间的Levenshtein距离

If your problem is similar to seeing how similar a string is to a list of already existing strings then I don't know of any algorithms that would do that without O(n 2 ) time. 如果您的问题类似于查看字符串与已存在的字符串列表的相似程度,那么我不知道没有O(n 2 )时间就能做到的算法。 Maybe there are some out there. 也许那里有一些。


But another important question is: how much time do you have? 但是另一个重要的问题是:您有多少时间? How many objects? 有多少个物体? If you have time or if your data set is small enough that an O(n 2 ) algorithm is practical, then you just have to iterate through your list of objects to see if it's below a certain threshold. 如果您有时间,或者您的数据集足够小,以至于O(n 2 )算法是可行的,那么您仅需遍历对象列表以查看其是否低于某个阈值。 If so, reject it. 如果是这样,请拒绝它。

Just overload AbstractCollection and replace the Add function. 只需重载AbstractCollection并替换Add函数即可。 Use an ArrayList or whatever. 使用ArrayList或其他。 Your code would look kind of like this 您的代码看起来像这样

class SimilarityRejector<T> extends AbstractCollection<T>{
     ArrayList<T> base;
     double threshold;

    public SimilarityRejector(double threshold){
        base = new ArrayList<T>();
        this.threshold = threshold;
    }

    public void add(T t){
       boolean failed = false;
       for(T compare : base){
          if(similarityComparison(t,compare) < threshold) faled = true;
       }
       if(!failed) base.add(t);
     }

    public Iterator<T> iterator() {
        return base.iterator();
    }

    public int size() {
        return base.size();
    }
}

etc. Obviously T would need to be a subclass of some class that you can perform a comparison on. 等等。显然,T必须是您可以对其进行比较的某个类的子类。 If you have a euclidean metric, then you can use a space partition, rather then going through every other item. 如果您具有欧几里德度量标准,则可以使用空间分区,而不要遍历其他所有项目。

  1. Create a TreeSet using your Comparator 使用比较器创建TreeSet
  2. Adds all elements into the set 将所有元素添加到集合中
  3. All duplicates are stripped out 删除所有重复项

IMHO, most elegant way was described by Gili (TreeSet with custom Comparator). 恕我直言,最优雅的方式是由Gili(带有自定义比较器的TreeSet)描述的。

But if you like to make it by yourself, seems this easiest and clearest solution: 但是,如果您想自己做,这似乎是最简单,最清晰的解决方案:

/**
 * Distinct input list values (cuts duplications)
 * @param items items to process
 * @param comparator comparator to recognize equal items
 * @return new collection with unique values
 */
public static <T> Collection<T> distinctItems(List<T> items, Comparator<T> comparator) {
    List<T> result = new ArrayList<>();

    for (int i = 0; i < items.size(); i++) {
        T item = items.get(i);

        boolean exists = false;
        for (int j = 0; j < result.size(); j++) {
            if (comparator.compare(result.get(j), item) == 0) {
                exists = true;
                break;
            }
        }

        if (!exists) {
            result.add(item);
        }
    }

    return result;
}

Normally "similarity" is not a transitive relationship. 通常,“相似性”不是传递关系。 So the first step would be to think of this in terms of equivalence rather than similarity. 因此,第一步是从等价而非相似的角度来考虑。 Equivalence is reflexive, symmetric and transitive. 等价是自反的,对称的和可传递的。

Easy approach here is to define a puzzle wrapper whose equals() and hashCode() methods are implemented according to the equivalence relation in question. 此处的简单方法是定义一个难题包装程序,该程序的equals()和hashCode()方法是根据所讨论的等效关系实现的。

Once you have that, drop the wrapped objects into a java.util.Set and that filters out duplicates. 一旦有了,就将包装的对象放到java.util.Set中,并过滤掉重复的对象。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM