简体   繁体   English

具有长度Comparator bug的Java TreeSet?

[英]Java TreeSet with length Comparator bug?

I have the below code which creates a TreeSet using a comparator based on string length. 我有下面的代码,它使用基于字符串长度的比较器创建一个TreeSet。

public class TreeSetComparator {
    public static void main(String[] args) {
        SortedSet<String> sortedSet = new TreeSet<>(Comparator.comparing(String::length));
        sortedSet.addAll(Arrays.asList("aa", "bb", "aa"));
        System.out.println(sortedSet);
    }
}

To my surprise the output of the above is 令我惊讶的是,上面的输出是

[aa]

While I would expect 虽然我期待

[aa, bb]

or 要么

[bb, aa]

The "bb" part disappears, which seems to be contrary to the SortedSet contract. “bb”部分消失,这似乎与SortedSet合约相反。 The comparator is supposed to only sort the elements and not determine their uniqueness, which is normally determined by equals. 比较器应该只对元素进行排序而不是确定它们的唯一性,这通常由等于确定。

On the other hand, if I enhance the comparator to always return non-zero for unequal items like below, only then do I get the correct results. 另一方面,如果我将比较器增强为总是为不等于下面的项目返回非零,那么我才能得到正确的结果。

    SortedSet<String> sortedSet = new TreeSet<>(Comparator.comparing(String::length).reversed().thenComparing(String::toString));
    sortedSet.addAll(Arrays.asList("aa", "bb", "aa"));
    System.out.println(sortedSet);

The output now is [aa, bb] as I would expect. 现在的输出是[aa, bb]正如我所料。

Is the above a bug in the TreeSet implementation? 以上是TreeSet实现中的错误吗?

My environment is as follows: 我的环境如下:

mvn --version                                                                                                                                            21:40:22
Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T19:33:14+01:00)
Maven home: /home/aaaa/.sdkman/candidates/maven/current
Java version: 10.0.2, vendor: Oracle Corporation, runtime: /usr/lib/jvm/java-10-jdk
Default locale: en_GB, platform encoding: UTF-8
OS name: "linux", version: "4.14.60-1-manjaro", arch: "amd64", family: "unix"

UPDATE UPDATE

Here is a related post along with suggestions on how to fix the issue in a future version of Java: https://yesday.github.io/blog/2018/java-gotchas-sorted-set-ignores-the-equals-method.html 以下是相关文章以及如何在未来版本的Java中解决问题的建议: https//yesday.github.io/blog/2018/java-gotchas-sorted-set-ignores-the-equals-method html的

This it not a bug. 这不是一个bug。 At least not in TreeSet . 至少不在TreeSet

From the javadoc, emphasis by me: 从javadoc,我的重点:

Note that the ordering maintained by a set (whether or not an explicit comparator is provided) must be consistent with equals if it is to correctly implement the Set interface. 请注意,如果要正确实现Set接口,则由set维护的排序(无论是否提供显式比较器)必须与equals一致 (See Comparable or Comparator for a precise definition of consistent with equals.) This is so because the Set interface is defined in terms of the equals operation, but a TreeSet instance performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the set, equal . (有关与equals一致的精确定义,请参阅Comparable或Comparator。)这是因为Set接口是根据equals操作定义的,但TreeSet实例使用compareTo(或compare)方法执行所有元素比较,因此从集合的角度来看,通过这种方法被认为相等的元素是相等的 The behavior of a set is well-defined even if its ordering is inconsistent with equals; 集合的行为即使其排序与equals不一致也是明确定义的; it just fails to obey the general contract of the Set interface. 它只是不遵守Set接口的一般合同。

So because "aa" and "bb" both have a length of 2 they are deemed equal by compareTo and thus by the TreeSet . 因此,因为“aa”和“bb”都具有2的长度,所以它们被compareTo认为是相等的,因此被TreeSet认为是相等的。

By definition , consistent with equals means: 根据定义与equals一致意味着:

The ordering imposed by a comparator c on a set of elements S is said to be consistent with equals if and only if c.compare(e1, e2)==0 has the same boolean value as e1.equals(e2) for every e1 and e2 in S. 当且仅当c.compare(e1,e2)== 0具有与每个e1的e1.equals(e2)相同的布尔值时,比较器c对一组元素S施加的排序被称为与等于一致。和S中的e2

Looks like they're assuming that the comparator uses the same definition of equals as the equals method. 看起来他们假设比较器使用与equals方法相同的equals定义。 From the SortedSet API: 从SortedSet API:

Note that the ordering maintained by a sorted set (whether or not an explicit comparator is provided) must be consistent with equals if the sorted set is to correctly implement the Set interface. 请注意,如果有序集合要正确实现Set接口,则由有序集合维护的排序(无论是否提供显式比较器)必须与equals一致。 (See the Comparable interface or Comparator interface for a precise definition of consistent with equals.) This is so because the Set interface is defined in terms of the equals operation, but a sorted set performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the sorted set, equal. (有关与equals一致的精确定义,请参阅Comparable接口或Comparator接口。)这是因为Set接口是根据equals操作定义的,但是有序集使用compareTo(或compare)方法执行所有元素比较因此,从排序集的角度来看,这种方法被认为相等的两个元素是相等的。 The behavior of a sorted set is well-defined even if its ordering is inconsistent with equals; 即使排序与equals不一致,排序集的行为也是明确定义的; it just fails to obey the general contract of the Set interface. 它只是不遵守Set接口的一般合同。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM