简体   繁体   English

使用Java 8 Streams API,在调用Collectors.toSet()时可以依赖sorted()吗?

[英]Using the Java 8 Streams API, can sorted() be relied upon when calling Collectors.toSet()?

This is the implementation of the java.util.stream.Collectors class's toSet() method: 这是java.util.stream.Collectors类的toSet()方法的实现:

public static <T>
Collector<T, ?, Set<T>> toSet() {
    return new CollectorImpl<>((Supplier<Set<T>>) HashSet::new, Set::add,
                               (left, right) -> { left.addAll(right); return left; },
                               CH_UNORDERED_ID);
}

As we can see, it uses a HashSet and calls add . 我们可以看到,它使用HashSet并调用add From the HashSet documentation , "It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time." HashSet 文档中 ,“它不能保证集合的迭代顺序;特别是,它不能保证订单在一段时间内保持不变。”

In the following code, a List of String is streamed, sorted and collected into a Set : 在下面的代码,一个ListString是流,分类收集到Set

public static void main(String[] args) {
    Set<String> strings = Arrays.asList("c", "a", "b")
            .stream()
            .sorted()
            .collect(Collectors.toSet());
    System.out.println(strings.getClass());
    System.out.println(strings);
}

This provides the output: 这提供了输出:

class java.util.HashSet

[a, b, c]

The output is sorted. 输出已排序。 What I think is happening here is that although the contract provided by the HashSet documentation specifies that ordering is not something it provides, the implementation happens to add in order. 我认为这里发生的事情是,尽管HashSet文档提供的合同规定了排序不是它提供的,但实现恰好按顺序添加。 I suppose this could change in future versions / vary between JVMs and that a wiser approach would be to do something like Collectors.toCollection(TreeSet::new) . 我想这可能会在未来的版本中发生变化/在JVM之间变化,而更明智的方法是做一些像Collectors.toCollection(TreeSet::new)这样的事情。

Can sorted() be relied upon when calling Collectors.toSet() ? 调用Collectors.toSet()时可以依赖sorted() Collectors.toSet()吗?

Additionally, what exactly does "it does not guarantee that the order will remain constant over time" mean? 此外,“它不能保证订单在一段时间内保持不变”究竟是什么意思? (I suppose add , remove , the resizing of the underlying array?) (我想addremove ,调整底层数组的大小?)

The answer is no. 答案是不。 Once you added the items into a Set you cannot rely on any order. 将项目添加到集合后,您不能依赖任何订单。 From JDK sourcecode (HashSet.java): 来自JDK源代码(HashSet.java):

/**
 * Returns an iterator over the elements in this set.  The elements
 * are returned in no particular order.
 *
 * @return an Iterator over the elements in this set
 * @see ConcurrentModificationException
 */
public Iterator<E> iterator() {
    return map.keySet().iterator();
}

Now, in previous versions of the JDK even though an order wasn't guaranteed, you'd usually get the items in the same order of insertion (unless the class of the objects implements hashCode() and then you'll get the order that is dictated by hashCode() ). 现在,在JDK的早期版本中,即使订单无法保证,您通常也会以相同的插入顺序获取项目(除非对象的类实现了hashCode() ,然后您将获得订单由hashCode()决定。 either the order of creation of the objects or the order of invocation of hashCode() on the objects. 要么是对象的创建顺序,要么是对象上hashCode()的调用顺序。 As @Holgar mentions in the comments below, in HotSpot it's the latter. 正如@Holgar在下面的评论中提到的,在HotSpot中它是后者。 And you can't even count on that since there are exceptions to this as well since the sequential number is not the only ingredient in the hashCode generator. 你甚至不能指望它,因为这也有例外,因为序列号不是hashCode生成器中的唯一成分。

I recently heard a talk from Stuart Marks (the guy who's responsible for a re-write of a major part of Collections in Java 9) and he said that they've added randomization to the iteration order of Sets (created by new set-factories) in Java 9. If you want to hear the session, the part that he talk about sets start here - good talk, highly recommended by the way!. 我最近听到了Stuart Marks (负责重写Java 9中Collections主要部分的人)的演讲,他说他们已经将随机化添加到集合的迭代顺序(由新的集合工厂创建)在Java 9中。如果你想听到会话,他谈到的部分就会从这里开始 - 好的谈话,强烈推荐的方式!

So even if you used to count on iteration order of Sets, once you move to Java 9 you should stop doing so. 因此,即使您曾经依赖于集合的迭代顺序,一旦转移到Java 9,您应该停止这样做。

All that said, if you need order you should consider using a SortedSet , LinkedHashSet or TreeSet 总而言之,如果您需要订购,您应该考虑使用SortedSetLinkedHashSetTreeSet

To answer that question, you have to know a bit about how HashSet is implemented. 要回答这个问题,您必须了解HashSet的实现方式。 As the name suggests, a HashSet is implemented using a hash table . 顾名思义, HashSet是使用哈希表实现的 Basically, a hash table is an array that is indexed by element hashes. 基本上,哈希表是由元素哈希索引的数组。 A hash function (in Java, an object's hash is calculated by object.hashCode() ) is basically a function that meets a few criteria: 散列函数(在Java中,对象的散列由object.hashCode()计算)基本上是满足一些条件的函数:

  • it is (relatively) quick to compute for a given element 它(相对)快速计算给定元素
  • two objects that .equals() each other have identical hashes 两个对象.equals()彼此具有相同的哈希值
  • there is a low probability that different items have the same hash 不同项目具有相同散列的概率很小

So, when you meed a HashSet that is "sorted" (which is understood as "the iterator preserves the natural order of elements"), this is due to a couple of coincidences: 因此,当你修改一个“已排序”的HashSet (它被理解为“迭代器保留元素的自然顺序”)时,这是由于几个巧合:

  • the natural order of elements respects the natural order of their hashCode s 元素的自然顺序尊重其hashCode的自然顺序
  • the hash table is small enough not to have collisions (two elements with the same hash code) 哈希表足够小,不会发生冲突(两个元素具有相同的哈希码)

If you look into the String class hashCode() method, you will see that for one-letter strings, the hash code corresponds to the Unicode index (codepoint) of the letter - so in this specific case, as long as the hash table is small enough, the elements will be sorted. 如果查看StringhashCode()方法,您将看到对于单字母字符串,哈希代码对应于字母的Unicode索引(代码点) - 因此在这种特定情况下,只要哈希表是足够小,元素将被排序。 However, this is a huge coincidence and 然而,这是一个巨大的巧合

  • will not hold for any other sort order 不会保留任何其他排序顺序
  • will not hold for classes whose hashCodes do not follow their natural ordering 不适用于hashCodes不遵循其自然顺序的类
  • will not hold hashtables with collisions 不会持有碰撞的哈希表

and moreover, this has nothing to do with the fact that sorted() was called on the stream - it's simply due to the way hashCode() is implemented and therefore the ordering of the hash table. 而且,这与在流上调用sorted()的事实无关 - 它只是由于hashCode()的实现方式,因此也是哈希表的排序。 Therefore, the simple answer to the question is "no". 因此,问题的简单答案是“不”。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM