简体   繁体   English

HashSet与JDK 7/8的顺序和区别

[英]HashSet order and difference with JDK 7 / 8

This is a two part question: 这是一个两部分问题:

  1. Does HashSet implement some hidden ordering mechanic or it just, to quote the documentation: It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time. HashSet是否实现了一些隐藏的排序机制,或者只是引用文档: It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time. It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time. tells me that the order MIGHT change sometimes in the future and/or depending on memory usage? 告诉我,未来有时可能会改变订单和/或取决于内存使用情况?
  2. Why am I getting completely different 'ordering' (dare I say) when I switch between JDKs? 当我在JDK之间切换时,为什么我会得到完全不同的“排序”(我敢说)?

Take this example: 举个例子:

for (int i = 0; i < 1000; i++) {
        Set<String> stringSet = new HashSet<>();
        stringSet.add("qwe");
        stringSet.add("rtz");
        stringSet.add("123");
        stringSet.add("qwea");
        stringSet.add("12334rasefasd");
        stringSet.add("asdxasd");
        stringSet.add("arfskt6734");
        stringSet.add("123121");
        stringSet.add("");
        stringSet.add("qwr");
        stringSet.add("rtzz");
        stringSet.add("1234");
        stringSet.add("qwes");
        stringSet.add("1234rasefasd");
        stringSet.add("asdxasdq");
        stringSet.add("arfskt6743");
        stringSet.add("123121 ");
        stringSet.add(" ");
        System.out.println(stringSet);
    }

will, no matter how many times I run it, produce the following output: 无论我运行多少次,都会产生以下输出:

JDK 7: [, , 123, qwea, asdxasdq, qwe, qwr, 123121 , arfskt6743, 1234rasefasd, qwes, rtz, rtzz, 1234, 12334rasefasd, asdxasd, arfskt6734, 123121] JDK 7: [, , 123, qwea, asdxasdq, qwe, qwr, 123121 , arfskt6743, 1234rasefasd, qwes, rtz, rtzz, 1234, 12334rasefasd, asdxasd, arfskt6734, 123121]

JDK 8: [, , qwes, arfskt6743, asdxasdq, 123121, 123121 , arfskt6734, qwr, 123, 1234, qwea, rtzz, rtz, 12334rasefasd, 1234rasefasd, qwe, asdxasd] JDK 8: [, , qwes, arfskt6743, asdxasdq, 123121, 123121 , arfskt6734, qwr, 123, 1234, qwea, rtzz, rtz, 12334rasefasd, 1234rasefasd, qwe, asdxasd]

Obviously, the empty string and the whitespace-only-string are leading the way both times, but the rest of it is completely different. 显然,空字符串和仅空白字符串两次都是引导方式,但其余部分完全不同。

According to the update on collections changes page 根据集合更改页面上的更新

The alternative String hash function added in 7u6 has been removed from JDK 8, along with the jdk.map.althashing.threshold system property. 7u6中添加的备用String散列函数已从JDK 8中删除,同时还有jdk.map.althashing.threshold系统属性。 Instead, hash bins containing a large number of colliding keys improve performance by storing their entries in a balanced tree instead of a linked list. 相反,包含大量冲突键的哈希箱通过将其条目存储在平衡树而不是链表中来提高性能。 This JDK 8 change applies only to HashMap, LinkedHashMap, and ConcurrentHashMap. 此JDK 8更改仅适用于HashMap,LinkedHashMap和ConcurrentHashMap。

In rare situations, this change could introduce a change to the iteration order of HashMap and HashSet. 在极少数情况下,此更改可能会引入对HashMap和HashSet的迭代顺序的更改。 A particular iteration order is not specified for HashMap objects - any code that depends on iteration order should be fixed. 没有为HashMap对象指定特定的迭代顺序 - 任何依赖于迭代顺序的代码都应该被修复。


So, Basically 所以,基本上

The Algorithm for hashing your set has changed to improve performance . 用于散列集合的算法已更改以提高性能 It changed to a balanced tree instead of a linked list. 它变为平衡树而不是链表。

This kind of change might change the iteration order of your set, and it's established that this kind of behavior should be fixed by you , if you depend upon it. 这种更改可能会改变您的集合的迭代顺序 ,并且已确定您应该修复此类行为 ,如果您依赖它。

You're seeing a better implementation of your set, which might look like it's ordered, but it's pure coincidence. 你看到了一个更好的集合实现,它可能看起来像是有序的,但它纯属巧合。

I would recommend you to not rely on the iteration order for sets, as the order is not a guarantee. 我建议你不要依赖集的迭代顺序,因为顺序不是保证。

@Edit @编辑

Another concept is also important, as stated by user Holger, 如用户Holger所述,另一个概念也很重要,

The “alternative String hash function” of Java 7 was not used by default. 默认情况下不使用Java 7的“替代字符串散列函数”。 Further, the balanced tree only applies to bucket collision scenarios. 此外,平衡树仅适用于桶冲突场景。 Still, as a consequence of this improvement, another, unmentioned change has been made. 尽管如此,由于这种改进,还有另一个未提及的变化。 The mapping of an object's hashcode to an array position undergoes a transformation which has been simplified from Java 7 to Java 8 对象的哈希码到数组位置的映射经历了从Java 7简化到Java 8的转换

The internal storage of a HashSet is defined by an algorithm. HashSet的内部存储由算法定义。 It is not randomized. 它不是随机的。

The specification (API) does not specify any particular algorithm, other than it being hash-based. 除了基于散列之外,规范(API)没有指定任何特定算法。 The implementation can choose any algorithm it wants, and is free to choose a different one in future versions. 实现可以选择它想要的任何算法,并且可以在将来的版本中自由选择不同的算法。

However, being algorithm-based, it means that for any particular version of the implementation (Oracle vs IBM, 7 vs 8, ...), adding a particular set of values will always produce the same result, ie ordering. 但是,基于算法,​​这意味着对于任何特定版本的实现(Oracle vs IBM,7 vs 8,...),添加一组特定的值将始终产生相同的结果,即排序。

The ordering is consistent for a specific version, but it is undefined and subject to change without notice in future versions and/or different implementations, so you should never rely on order. 排序是针对特定版本一致的 ,但它是不确定的,并可能在未来的版本和/或不同的实现更改,恕不另行通知,所以你应该依赖于顺序。

To have something more exciting, change your example code to 要获得更令人兴奋的内容,请将示例代码更改为

public static void main(String... args) {
  System.out.println(System.getProperty("java.version"));
  List<String> strings=Arrays.asList("qwe", "rtz", "123", "qwea",
      "12334rasefasd", "asdxasd", "arfskt6734", "123121", "", "qwr",
      "rtzz", "1234", "qwes", "1234rasefasd", "asdxasdq", "arfskt6743",
      "123121 ", " ");

  for (int i = 5; i < 26; i++) {
      Set<String> stringSet = new HashSet<>(1<<i);
      stringSet.addAll(strings);
      System.out.println(stringSet);
  }
}

This still adds the same strings in the same order to a HashSet , but the HashSet has been initialized with different capacities. 这仍然以相同的顺序将相同的字符串添加到HashSet ,但是HashSet已经初始化为具有不同的容量。
The result is 结果是

1.7.0_51
[,  , qwea, 123, asdxasdq, qwe, qwr, 123121 , arfskt6743, 1234rasefasd, qwes, rtzz, rtz, 1234, 12334rasefasd, arfskt6734, asdxasd, 123121]
[, qwr, arfskt6743, rtzz, 12334rasefasd,  , qwea, 123, asdxasdq, qwe, 123121 , 1234rasefasd, qwes, rtz, 1234, arfskt6734, asdxasd, 123121]
[, qwr, arfskt6743, rtzz, 12334rasefasd,  , 123, rtz, 1234, arfskt6734, asdxasd, 123121, qwea, asdxasdq, qwe, 123121 , 1234rasefasd, qwes]
[, qwr, arfskt6743, rtzz, 12334rasefasd,  , arfskt6734, asdxasd, 123121, 123121 , 1234rasefasd, 123, rtz, 1234, qwea, asdxasdq, qwe, qwes]
[, rtzz,  , 123121, 123121 , 1234rasefasd, 123, rtz, qwea, asdxasdq, qwe, qwes, qwr, arfskt6743, 12334rasefasd, arfskt6734, asdxasd, 1234]
[, rtzz,  , 123121, 123, asdxasdq, 123121 , 1234rasefasd, rtz, qwea, qwe, qwes, qwr, arfskt6743, 12334rasefasd, arfskt6734, asdxasd, 1234]
[,  , 123121, asdxasdq, 1234rasefasd, rtz, qwea, qwes, arfskt6743, 12334rasefasd, arfskt6734, asdxasd, rtzz, 123, 123121 , qwe, qwr, 1234]
[,  , 123121, 1234rasefasd, rtz, qwea, qwes, asdxasd, rtzz, 123, 1234, asdxasdq, arfskt6743, 12334rasefasd, arfskt6734, 123121 , qwe, qwr]
[,  , rtz, asdxasd, rtzz, arfskt6743, 12334rasefasd, arfskt6734, 123121 , qwe, qwr, 123121, 1234rasefasd, qwea, qwes, 123, 1234, asdxasdq]
[,  , arfskt6743, 12334rasefasd, arfskt6734, qwea, qwes, 1234, asdxasdq, rtz, asdxasd, rtzz, 123121 , qwe, qwr, 123121, 1234rasefasd, 123]
[,  , qwea, qwes, rtz, asdxasd, rtzz, 123121 , qwe, qwr, 123, arfskt6743, 12334rasefasd, arfskt6734, 1234, asdxasdq, 123121, 1234rasefasd]
[,  , qwea, qwes, asdxasd, 1234, asdxasdq, 123121, 1234rasefasd, rtz, rtzz, 123121 , qwe, qwr, 123, arfskt6743, 12334rasefasd, arfskt6734]
[,  , qwea, qwes, asdxasd, 1234, 1234rasefasd, rtzz, 123121 , 123, arfskt6743, arfskt6734, asdxasdq, 123121, rtz, qwe, qwr, 12334rasefasd]
[,  , 1234rasefasd, 123, asdxasdq, rtz, qwe, qwr, 12334rasefasd, qwea, qwes, asdxasd, 1234, rtzz, 123121 , arfskt6743, arfskt6734, 123121]
[,  , 123, asdxasdq, rtz, qwe, qwr, 12334rasefasd, 123121 , 123121, 1234rasefasd, qwea, qwes, asdxasd, 1234, rtzz, arfskt6743, arfskt6734]
[,  , 123, asdxasdq, rtz, qwe, qwr, 12334rasefasd, 1234rasefasd, qwea, qwes, asdxasd, 1234, rtzz, 123121 , 123121, arfskt6743, arfskt6734]
[,  , 123, rtz, qwe, qwr, 12334rasefasd, asdxasd, asdxasdq, 1234rasefasd, qwea, qwes, 1234, rtzz, 123121 , 123121, arfskt6743, arfskt6734]
[,  , 123, rtz, qwe, qwr, asdxasd, asdxasdq, 1234, arfskt6743, arfskt6734, 12334rasefasd, 1234rasefasd, qwea, qwes, rtzz, 123121 , 123121]
[,  , 123, rtz, qwe, qwr, asdxasdq, 1234, 12334rasefasd, 1234rasefasd, qwea, qwes, rtzz, 123121 , 123121, asdxasd, arfskt6743, arfskt6734]
[,  , 123, rtz, qwe, qwr, 1234, 12334rasefasd, qwea, qwes, rtzz, 123121 , asdxasdq, 1234rasefasd, 123121, asdxasd, arfskt6743, arfskt6734]
[,  , 123, rtz, qwe, qwr, 1234, qwea, qwes, rtzz, 12334rasefasd, 123121 , asdxasdq, 1234rasefasd, 123121, asdxasd, arfskt6743, arfskt6734]
1.8.0_111
[,  , qwes, arfskt6743, asdxasdq, 123121, 123121 , arfskt6734, qwr, 123, 1234, qwea, rtzz, rtz, 12334rasefasd, 1234rasefasd, qwe, asdxasd]
[, 123121, arfskt6734, qwr, 1234, asdxasd,  , qwes, arfskt6743, asdxasdq, 123121 , 123, qwea, rtzz, rtz, 12334rasefasd, 1234rasefasd, qwe]
[, arfskt6734, qwr,  , arfskt6743, asdxasdq, 123, rtzz, 123121, 1234, asdxasd, qwes, 123121 , qwea, rtz, 12334rasefasd, 1234rasefasd, qwe]
[, qwr,  , 123, rtzz, 1234, asdxasd, qwes, 123121 , qwea, rtz, 1234rasefasd, arfskt6734, arfskt6743, asdxasdq, 123121, 12334rasefasd, qwe]
[,  , 123, 1234, rtz, arfskt6734, arfskt6743, asdxasdq, 123121, 12334rasefasd, qwe, qwr, rtzz, asdxasd, qwes, 123121 , qwea, 1234rasefasd]
[,  , 1234, asdxasdq, 123121, 12334rasefasd, rtzz, asdxasd, qwes, 123121 , qwea, 123, rtz, arfskt6734, arfskt6743, qwe, qwr, 1234rasefasd]
[,  , 1234, 12334rasefasd, qwes, qwea, rtz, asdxasdq, 123121, rtzz, asdxasd, 123121 , 123, arfskt6734, arfskt6743, qwe, qwr, 1234rasefasd]
[,  , asdxasdq, rtzz, 123121 , arfskt6734, arfskt6743, qwe, qwr, 1234, 12334rasefasd, qwes, qwea, rtz, 123121, asdxasd, 123, 1234rasefasd]
[,  , asdxasdq, 123121 , arfskt6734, arfskt6743, 1234, 12334rasefasd, qwes, qwea, 123121, asdxasd, 1234rasefasd, rtzz, qwe, qwr, rtz, 123]
[,  , asdxasdq, 1234, rtzz, 123121 , arfskt6734, arfskt6743, 12334rasefasd, qwes, qwea, 123121, asdxasd, 1234rasefasd, qwe, qwr, rtz, 123]
[,  , 1234, rtzz, 123121 , arfskt6734, arfskt6743, 12334rasefasd, qwes, qwea, 123121, asdxasd, qwe, qwr, rtz, 123, asdxasdq, 1234rasefasd]
[,  , 1234, 123121 , arfskt6734, arfskt6743, qwes, qwea, asdxasd, rtzz, 12334rasefasd, 123121, qwe, qwr, rtz, 123, asdxasdq, 1234rasefasd]
[,  , arfskt6734, arfskt6743, asdxasd, 123, asdxasdq, 1234rasefasd, 1234, 123121 , qwes, qwea, rtzz, 12334rasefasd, 123121, qwe, qwr, rtz]
[,  , arfskt6734, arfskt6743, 123, asdxasdq, 123121 , qwes, qwea, rtzz, 12334rasefasd, 123121, qwe, qwr, rtz, asdxasd, 1234rasefasd, 1234]
[,  , 123, 123121 , qwe, qwr, rtz, asdxasd, 1234rasefasd, arfskt6734, arfskt6743, asdxasdq, qwes, qwea, rtzz, 12334rasefasd, 123121, 1234]
[,  , 123, qwe, qwr, rtz, asdxasd, 1234rasefasd, arfskt6734, arfskt6743, qwes, qwea, rtzz, 123121, 1234, 123121 , asdxasdq, 12334rasefasd]
[,  , 123, qwe, qwr, rtz, arfskt6734, arfskt6743, 123121 , asdxasdq, asdxasd, 1234rasefasd, qwes, qwea, rtzz, 123121, 1234, 12334rasefasd]
[,  , 123, qwe, qwr, rtz, arfskt6734, arfskt6743, 123121 , 1234, 12334rasefasd, asdxasdq, asdxasd, 1234rasefasd, qwes, qwea, rtzz, 123121]
[,  , 123, qwe, qwr, rtz, arfskt6734, arfskt6743, 1234, asdxasdq, asdxasd, 1234rasefasd, qwes, qwea, rtzz, 123121 , 12334rasefasd, 123121]
[,  , 123, qwe, qwr, rtz, 1234, asdxasdq, asdxasd, qwes, qwea, rtzz, 123121 , 12334rasefasd, 123121, arfskt6734, arfskt6743, 1234rasefasd]
[,  , 123, qwe, qwr, rtz, 1234, qwes, qwea, rtzz, 123121 , 123121, arfskt6734, arfskt6743, asdxasdq, asdxasd, 12334rasefasd, 1234rasefasd]

This demonstrates that the iteration order is not only depending on the particular implementation, but also on the history of the HashSet . 这表明迭代顺序不仅取决于特定的实现,还取决于HashSet的历史。 A higher capacity might also be the result of being previously bigger but having elements removed. 更高的容量也可能是因为之前更大但删除了元素。

While the hash code determines which array position to use for an element, there might also be collisions, causing elements to share an entry. 虽然哈希码确定要用于元素的阵列位置,但也可能存在冲突,导致元素共享条目。 Within that entry the collision might get resolved through a linked list, in which case, the order within this bucket reflects the insertion order, so it also depends on the set's history, or might get resolved by using a balanced tree since Java 8, which will reflect the order of either, the hashcodes or the element's natural order, depending on whether this as a true hash collision or just a bucket collision . 在该条目中,冲突可能通过链表解析,在这种情况下,此存储桶中的顺序反映了插入顺序,因此它还取决于集合的历史记录,或者可能通过使用自Java 8以来的平衡树来解决,将反映其中一个,即哈希码或元素的自然顺序的顺序,具体取决于它是真正的哈希冲突还是只是桶冲突

But Java 8's HashSet will only use the tree, if there are a certain number of collisions at a bucket, otherwise, it also uses a linked list. 但是,如果存储桶中存在一定数量的冲突,则Java 8的HashSet将仅使用树,否则,它还使用链表。 To avoid switching back and forth between these variants, it uses different thresholds for converting to a tree and for converting back to a linked list. 为了避免在这些变体之间来回切换,它使用不同的阈值来转换为树并转换回链接列表。 So, if the number of collision is in-between these threshold, it will again depend on the set's history, ie whether there were more elements previously, which form and hence which order it will have. 因此,如果碰撞的数量介于这些阈值之间,它将再次取决于集合的历史,即先前是否存在更多元素,哪些形式以及因此具有哪个顺序。


Note that Java 7's “alternative String hash function” was disabled by default and the collision resolution addresses a corner case. 请注意,默认情况下禁用Java 7的“替代字符串散列函数”,并且冲突解决方案解决了一个极端情况。 Still, as you can see from the output, there is almost always a notable difference in the iteration order. 但是,正如您从输出中看到的那样,迭代顺序几乎总是存在显着差异。

The reason is that now that collisions are handled more efficiently, the attempts to avoid collisions have been reduced. 原因在于,现在可以更有效地处理冲突,减少了避免冲突的尝试。 In Java 7 , hash codes underwent the following transformation before getting mapped to array positions: Java 7中 ,哈希码在映射到数组位置之前经历了以下转换:

 h ^= (h >>> 20) ^ (h >>> 12);
 return h ^ (h >>> 7) ^ (h >>> 4);

In contrast, Java 8 uses the following transformation: 相比之下, Java 8使用以下转换:

return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);

This has an immediate impact on the iteration order, even if no collisions occurred. 即使没有发生冲突,这也会对迭代顺序产生直接影响。

Ans 1 : Yes Hashset doesn't maintain insertion order but after that if you iterate it you will get same order every time. 答案1:是的Hashset不维护插入顺序,但在此之后如果你迭代它,你每次都会得到相同的顺序。

Ans 2: iteration result may differ with java versions because it depends on hashcode implementation of that version. 回答2:迭代结果可能与java版本不同,因为它依赖于该版本的哈希码实现。 but Hashset provides one surety that iteration order will never get change means after insertion of elements if you iterate it every time you will get same order in that java version. 但是Hashset提供了一个保证,迭代顺序永远不会得到更改意味着在插入元素之后,如果每次在java版本中获得相同的顺序时迭代它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM