[英]Using the Java 8 Streams API, can sorted() be relied upon when calling Collectors.toSet()?
This is the implementation of the java.util.stream.Collectors
class's toSet()
method: 这是java.util.stream.Collectors
类的toSet()
方法的实现:
public static <T>
Collector<T, ?, Set<T>> toSet() {
return new CollectorImpl<>((Supplier<Set<T>>) HashSet::new, Set::add,
(left, right) -> { left.addAll(right); return left; },
CH_UNORDERED_ID);
}
As we can see, it uses a HashSet
and calls add
. 我们可以看到,它使用HashSet
并调用add
。 From the HashSet
documentation , "It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time." 从HashSet
文档中 ,“它不能保证集合的迭代顺序;特别是,它不能保证订单在一段时间内保持不变。”
In the following code, a List
of String
is streamed, sorted and collected into a Set
: 在下面的代码,一个List
的String
是流,分类收集到Set
:
public static void main(String[] args) {
Set<String> strings = Arrays.asList("c", "a", "b")
.stream()
.sorted()
.collect(Collectors.toSet());
System.out.println(strings.getClass());
System.out.println(strings);
}
This provides the output: 这提供了输出:
class java.util.HashSet
[a, b, c]
The output is sorted. 输出已排序。 What I think is happening here is that although the contract provided by the HashSet
documentation specifies that ordering is not something it provides, the implementation happens to add in order. 我认为这里发生的事情是,尽管HashSet
文档提供的合同规定了排序不是它提供的,但实现恰好按顺序添加。 I suppose this could change in future versions / vary between JVMs and that a wiser approach would be to do something like Collectors.toCollection(TreeSet::new)
. 我想这可能会在未来的版本中发生变化/在JVM之间变化,而更明智的方法是做一些像Collectors.toCollection(TreeSet::new)
这样的事情。
Can sorted()
be relied upon when calling Collectors.toSet()
? 调用Collectors.toSet()
时可以依赖sorted()
Collectors.toSet()
吗?
Additionally, what exactly does "it does not guarantee that the order will remain constant over time" mean? 此外,“它不能保证订单在一段时间内保持不变”究竟是什么意思? (I suppose add
, remove
, the resizing of the underlying array?) (我想add
, remove
,调整底层数组的大小?)
The answer is no. 答案是不。 Once you added the items into a Set you cannot rely on any order. 将项目添加到集合后,您不能依赖任何订单。 From JDK sourcecode (HashSet.java): 来自JDK源代码(HashSet.java):
/**
* Returns an iterator over the elements in this set. The elements
* are returned in no particular order.
*
* @return an Iterator over the elements in this set
* @see ConcurrentModificationException
*/
public Iterator<E> iterator() {
return map.keySet().iterator();
}
Now, in previous versions of the JDK even though an order wasn't guaranteed, you'd usually get the items in the same order of insertion (unless the class of the objects implements 现在,在JDK的早期版本中,即使订单无法保证,您通常也会hashCode()
and then you'll get the order that is dictated by hashCode()
).以相同的插入顺序获取项目(除非对象的类实现了 either the order of creation of the objects or the order of invocation of hashCode()
,然后您将获得订单由hashCode()
决定。hashCode()
on the objects. 要么是对象的创建顺序,要么是对象上hashCode()
的调用顺序。 As @Holgar mentions in the comments below, in HotSpot it's the latter. 正如@Holgar在下面的评论中提到的,在HotSpot中它是后者。 And you can't even count on that since there are exceptions to this as well since the sequential number is not the only ingredient in the hashCode generator. 你甚至不能指望它,因为这也有例外,因为序列号不是hashCode生成器中的唯一成分。
I recently heard a talk from Stuart Marks (the guy who's responsible for a re-write of a major part of Collections in Java 9) and he said that they've added randomization to the iteration order of Sets (created by new set-factories) in Java 9. If you want to hear the session, the part that he talk about sets start here - good talk, highly recommended by the way!. 我最近听到了Stuart Marks (负责重写Java 9中Collections主要部分的人)的演讲,他说他们已经将随机化添加到集合的迭代顺序(由新的集合工厂创建)在Java 9中。如果你想听到会话,他谈到的部分就会从这里开始 - 好的谈话,强烈推荐的方式!
So even if you used to count on iteration order of Sets, once you move to Java 9 you should stop doing so. 因此,即使您曾经依赖于集合的迭代顺序,一旦转移到Java 9,您应该停止这样做。
All that said, if you need order you should consider using a SortedSet
, LinkedHashSet
or TreeSet
总而言之,如果您需要订购,您应该考虑使用SortedSet
, LinkedHashSet
或TreeSet
To answer that question, you have to know a bit about how HashSet
is implemented. 要回答这个问题,您必须了解HashSet
的实现方式。 As the name suggests, a HashSet
is implemented using a hash table . 顾名思义, HashSet
是使用哈希表实现的 。 Basically, a hash table is an array that is indexed by element hashes. 基本上,哈希表是由元素哈希索引的数组。 A hash function (in Java, an object's hash is calculated by object.hashCode()
) is basically a function that meets a few criteria: 散列函数(在Java中,对象的散列由object.hashCode()
计算)基本上是满足一些条件的函数:
.equals()
each other have identical hashes 两个对象.equals()
彼此具有相同的哈希值 So, when you meed a HashSet
that is "sorted" (which is understood as "the iterator preserves the natural order of elements"), this is due to a couple of coincidences: 因此,当你修改一个“已排序”的HashSet
(它被理解为“迭代器保留元素的自然顺序”)时,这是由于几个巧合:
hashCode
s 元素的自然顺序尊重其hashCode
的自然顺序 If you look into the String
class hashCode()
method, you will see that for one-letter strings, the hash code corresponds to the Unicode index (codepoint) of the letter - so in this specific case, as long as the hash table is small enough, the elements will be sorted. 如果查看String
类hashCode()
方法,您将看到对于单字母字符串,哈希代码对应于字母的Unicode索引(代码点) - 因此在这种特定情况下,只要哈希表是足够小,元素将被排序。 However, this is a huge coincidence and 然而,这是一个巨大的巧合
and moreover, this has nothing to do with the fact that sorted()
was called on the stream - it's simply due to the way hashCode()
is implemented and therefore the ordering of the hash table. 而且,这与在流上调用sorted()
的事实无关 - 它只是由于hashCode()
的实现方式,因此也是哈希表的排序。 Therefore, the simple answer to the question is "no". 因此,问题的简单答案是“不”。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.