简体   繁体   English

是什么导致java.util.HashSet和HashMap.keySet()类的iterator()稍微不可预测的排序?

[英]What causes the slightly unpredictable ordering of the iterator() for the java.util.HashSet and HashMap.keySet() classes?

Six years ago, I burned several days trying to hunt down where my perfectly deterministic framework was responding randomly. 六年前,我烧了几天试图追捕我完全确定的框架随机响应的地方。 After meticulously chasing the entire framework ensuring that it was all using the same instance of Random, I then kept chasing by single stepping code. 在精心追逐整个框架确保它全部使用相同的Random实例后,我继续追逐单步执行代码。 It was highly repetitive iterative self-calling code. 这是高度重复的迭代自调用代码。 Worse, the damn effect would only show up after a huge number of iterations were completed. 更糟糕的是,该死的效果只会在完成大量迭代后出现。 And after +6 hours, I was finally at wits end when I discovered a line in the javadoc for HashSet.iterator() indicating it doesn't guarantee the order in which it will return elements. 在+6小时后,当我在javadoc中为HashSet.iterator()发现一行时,我终于处于智慧状态,表明它不能保证返回元素的顺序。 I then went through my entire code base and replaced all instances of HashSet with LinkedHashSet. 然后我浏览了整个代码库,并用LinkedHashSet替换了所有HashSet实例。 And low-and-behold, my framework sprang right to deterministic life! 而且,我的框架正好向确定性生活迈进! ARGH! 哎呀!

I have now just experienced this same FREAKIN affect, again (at least it was only 3 hours this time). 我现在刚刚经历过同样的FREAKIN影响(至少这次只有3个小时)。 For whatever reason, I missed the small detail that HashMap happens to BEHAVE THE SAME WAY for its keySet(). 无论出于何种原因,我都错过了HashMap碰巧为其keySet()获得相同方式的细节。

Here's an SO thread on this subject, although the discussion never quite answers my question: Iteration order of HashSet 这是关于这个主题的SO线程,虽然讨论从来没有完全回答我的问题: HashSet的迭代顺序

So, I am curious as to why this might occur. 所以,我很好奇为什么会这样。 Given both times I had a huge single threaded java application crawling through exactly the same instantiation/insertion space with exactly the same JVM parameters (multiple runs from the same batch file) on the same computer with almost nothing else running, what could possibly perturb the JVM such that HashSet and HashMap would, after an enormous number of iterations, behave unpredictably (not inconsistenly as the javadoc says not to depend upon the order)? 鉴于我两次都有一个巨大的单线程java应用程序在完全相同的实例化/插入空间中使用完全相同的JVM参数(来自同一批处理文件的多次运行)在同一台计算机上运行,​​几乎没有其他任何运行,可能会扰乱JVM使得HashSet和HashMap在经过大量迭代之后会表现得不可预测(并不是因为javadoc说不依赖于顺序而不一致)?

Any ideas around this from either the source code (implementation of these classes in java.util) or from your knowledge of the JVM (perhaps some GC affect where internal java classes get non-zeroed memory when allocating internal memory spaces)? 从源代码(java.util中的这些类的实现)或者你对JVM的了解(可能是某些GC影响内部java类在分配内部存储空间时获得非零内存的位置)的任何想法?

Short Answer 简答

There's a tradeoff. 有一个权衡。 If you want amortized constant time O(1) access to elements, the techniques to date rely upon a randomized scheme like hashing. 如果您希望对元素进行分摊的常量时间O(1)访问,那么迄今为止的技术依赖于像散列这样的随机方案。 If you want ordered access to elements, the best engineering tradeoff gives you only O(ln(n)) performance. 如果您想要对元素进行有序访问,那么最佳工程权衡只能为您提供O(ln(n))性能。 For your case, perhaps this doesn't matter, but the difference between constant time and logarithmic time makes a very big difference starting even with relatively small structures. 对于你的情况,也许这并不重要,但是即使相对较小的结构,恒定时间和对数时间之间的差异也会产生很大的差异。

So yes, you can go look at the code and inspect carefully, but it boils down to a rather practical theoretical fact. 所以,是的,您可以仔细查看代码并仔细检查,但它归结为一个相当实际的理论事实。 Now is a good time to brush the dust off that copy of Cormen (or Googly Bookiness here ) that's propping up the drooping corner of your house's foundation and take a look at Chapters 11 (Hash Tables) and 13 (Red-Black Trees). 现在是刷掉那些支撑你房子基础的下垂角落的Cormen (或Googly Bookiness )副本上的灰尘的好时机,看看第11章(哈希表)和第13章(红黑树)。 These will fill you in on the JDK's implementation of HashMap and TreeMap, respectively. 这些将分别填充JDK的HashMap和TreeMap实现。

Long Answer 答案很长

You don't want a Map or Set to return ordered lists of keys/members. 您不希望MapSet返回键/成员的有序列表。 That's not what they're for. 这不是他们想要的。 Maps and Sets structures are not ordered just like the underlying mathematical concepts, and they provide different performance. 地图和集合结构不像基础数学概念那样排序,它们提供不同的性能。 The objective of these data structures (as @thejh points out) is efficient amortized insert , contains , and get time, not maintaining ordering. 这些数据结构的目标(如@thejh所指出的)是有效的摊销insertcontainsget时间,而不是维持排序。 You can look into how a hashed data structure is maintained to know what the tradeoffs are. 您可以了解如何维护散列数据结构以了解权衡取舍。 Take a look at the Wikipedia entries on Hash Functions and Hash Tables (ironically, note that the Wiki entry for "unordered map" redirects to the latter) or a computer science / data structures text. 看看关于Hash函数哈希表的Wikipedia条目(具有讽刺意味的是,注意“无序映射”的Wiki条目重定向到后者)或计算机科学/数据结构文本。

Remember: Don't depend on properties of ADTs (and specifically collections) such as ordering, immutability, thread safety or anything else unless you look carefully at what the contract is. 请记住:除非您仔细查看合同是什么,否则不要依赖于ADT(特别是集合)的属性,例如订购,不变性,线程安全或其他任何内容。 Note that for Map, the Javadoc says clearly: 请注意,对于Map,Javadoc清楚地说:

The order of a map is defined as the order in which the iterators on the map's collection views return their elements. 地图的顺序定义为地图集合视图上的迭代器返回其元素的顺序。 Some map implementations, like the TreeMap class, make specific guarantees as to their order; 一些地图实现,比如TreeMap类,对它们的顺序做出了特定的保证; others, like the HashMap class, do not. 其他人,比如HashMap类,没有。

And Set.iterator() has the similar: Set.iterator()有类似的:

Returns an iterator over the elements in this set. 返回此set中元素的迭代器。 The elements are returned in no particular order (unless this set is an instance of some class that provides a guarantee). 元素以无特定顺序返回(除非此集合是某个提供保证的类的实例)。

If you want an ordered view of these, use one of the following approaches: 如果您想要这些的有序视图,请使用以下方法之一:

  • If it's just a Set , maybe you really want a SortedSet such as a TreeSet 如果它只是一个Set ,也许你真的想要一个SortedSet比如TreeSet
  • Use a TreeMap , which allows either natural ordering of keys or a specific ordering via Comparator 使用TreeMap ,它允许自然排序键或通过Comparator进行特定排序
  • Abstract your data structure, which probably is an application-specific thing anyway if this is the behavior you want, and maintain both a SortedSet of keys as well as a Map , which will perform better in amortized time. 摘要你的数据结构,如果这是你想要的行为,它可能是一个特定于应用程序的东西,并维护一个SortedSet键和一个Map ,它将在摊销时间内表现更好。
  • Get the Map.keySet() (or just the Set you're interested in) and put it into a SortedSet such as TreeSet , either using the natural ordering or a specific Comparator . 获取Map.keySet() (或者只是您感兴趣的Set )并将其放入SortedSet例如TreeSet ,使用自然顺序或特定的Comparator
  • Iterate over the Map.Entry<K,V> using Map.entrySet().iterator() , after it has been sorted. 在对Map.Entry<K,V>进行排序后,使用Map.entrySet().iterator()它。 Eg for (final Map.Entry<K,V> entry : new TreeSet(map.entrySet())) { } to efficiently access both keys and values. 例如for (final Map.Entry<K,V> entry : new TreeSet(map.entrySet())) { }可以有效地访问键和值。
  • If you are only doing this once and awhile, you could just get an array of values out of your structure and use Arrays.sort() , which has a different performance profile (space and time). 如果您只是这样做一次,您可以从结构中获取一组值并使用Arrays.sort() ,它具有不同的性能配置文件(空间和时间)。

Links to the Source 链接到源

If you would like to look at the source for juHashSet and juHashMap , they are available on GrepCode. 如果您想查看juHashSetjuHashMap的源代码,可以在GrepCode上找到它们。 Note that a HashSet is just sugar for a HashMap. 请注意,HashSet只是HashMap的糖。 Why not always use the sorted versions? 为什么不总是使用排序版本? Well, as I allude above, the performance differs and that matters in some applications. 好吧,正如我在上面提到的那样,性能不同而且在某些应用中很重要。 See the related SO question here . 请在此处查看相关的SO问题 You can also see some concrete performance numbers at the bottom here (I haven't looked closely to verify these are accurate, but they happen to substantiate my point, so I'll blithely pass along the link. :-) 您还可以在底部看到一些具体的性能数字(我没有仔细查看以确认这些是准确的,但它们恰好证实了我的观点,所以我会轻松地传递链接。:-)

I've struck this before, where the order wasn't important , but did affect the results. 我之前已经解决了这个问题,订单并不重要 ,但确实影响了结果。

The multi-threaded nature of Java means that repeated runs with exactly the same inputs can be affected by slight timing differences in (for example) how long it takes to allocate a new block of memory, which might sometimes require paging out to disk the previous contents, and at other times that isn't needed. Java的多线程特性意味着具有完全相同输入的重复运行可能受到(例如)分配新内存块需要多长时间的微小时间差异的影响,这可能有时需要分页到磁盘内容,以及其他不需要的内容。 Some other thread not using that page may proceed, and you could end up with a different order of object creation, when System objects are taken into account. 其他一些不使用该页面的线程可能会继续,并且当考虑系统对象时,您最终可能会创建不同的对象创建顺序。

That can affect the Object.hashCode() result for the equivalent object in different runs of the JVM. 这可能会影响JVM的不同运行中的等效对象的Object.hashCode()结果。

For me, I decided to add the small overhead of using a LinkedHashMap , in order to be able to reproduce the results of the tests I was running. 对我来说,我决定添加使用LinkedHashMap的小额开销,以便能够重现我正在运行的测试的结果。

http://download.oracle.com/javase/1.4.2/docs/api/java/lang/Object.html#hashCode () says: http://download.oracle.com/javase/1.4.2/docs/api/java/lang/Object.html#hashCode ()说:

As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. 尽可能合理,Object类定义的hashCode方法确实为不同的对象返回不同的整数。 (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.) (这通常通过将对象的内部地址转换为整数来实现,但JavaTM编程语言不需要此实现技术。)

So maybe the internal address changes? 那么内部地址可能会改变吗?

This also means that you could propably fix it without giving up speed by writing your own hashCode() method for everything that should act as a key. 这也意味着您可以通过为应该充当键的所有内容编写自己的hashCode()方法,在不放弃速度的情况下修复它。

You should NEVER depend on the order of a hash map. 你永远不应该依赖哈希映射的顺序。

If you want a Map with a deterministic ordering, I suggest you use a SortedMap/SortedSet like TreeMap/TreeSet or use LinkedHashMap/LinkedHashSet. 如果你想要一个确定性排序的Map,我建议你使用像TreeMap / TreeSet这样的SortedMap / SortedSet,或者使用LinkedHashMap / LinkedHashSet。 I use the later often, not because the program needs the ordering, but because its easier to read logs/debug the state of the map. 我经常使用后者,不是因为程序需要排序,而是因为它更容易读取日志/调试地图的状态。 ie when you add a key, it goes to the end every time. 即,当你添加一个键时,它每次都会结束。

You can create two HashMap/HashSet with the same elements but get different orders depending on the capacity of the collection. 您可以使用相同的元素创建两个HashMap / HashSet,但根据集合的容量获取不同的顺序。 It is possible for subtle differences in how your code runs to trigger a different final bucket size and therefor a different order. 代码运行方式的细微差别可能会触发不同的最终存储桶大小,从而导致不同的顺序。

eg 例如

public static void main(String... args) throws IOException {
    printInts(new HashSet<Integer>(8,2));
    printInts(new HashSet<Integer>(16,1));
    printInts(new HashSet<Integer>(32,1));
    printInts(new HashSet<Integer>(64,1));
}

private static void printInts(HashSet<Integer> integers) {
    integers.addAll(Arrays.asList(0,10,20,30,40,50,60,70,80,90,100));
    System.out.println(integers);
}

prints 版画

[0, 50, 100, 70, 40, 10, 80, 20, 90, 60, 30]
[0, 50, 100, 70, 80, 20, 40, 10, 90, 60, 30]
[0, 100, 70, 40, 10, 50, 80, 20, 90, 60, 30]
[0, 70, 10, 80, 20, 90, 30, 100, 40, 50, 60]

Here you have HashSet(s) with the same values added in the same order resulting in different iterator orders. 这里有HashSet,它们以相同的顺序添加相同的值,导致不同的迭代器顺序。 You may not be playing with the constructor, but your application could cause a different bucket size indirectly. 您可能没有使用构造函数,但您的应用程序可能会间接导致不同的存储桶大小。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM