简体   繁体   中英

How does the order of elements in a HashSet work?

I understand that the order of elements in a HashSet is supposed to be arbitrary. But out of curiosity, could anyone tell me exactly how the order is determined?

I noticed that when I insert two elements (say A and B), the order would come out A, B , then re-executing the same code again would give me B, A , then re-excecuting it the third time would give me A, B .

I mean, that's kind of un-deterministic, and a bit weird.

The order is determined by the Hashing algorithm used within the Hash Map/Set, the exact settings of that Map and the Hashcodes of the objects.

If your objects have consistent hashcodes over multiple runs (Strings for example) and are placed in the same order into a map with the same settings then in general they would come out in the same order each time. If they don't then they won't.

The source for HashMap can be seen here: http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/HashMap.java

In fact an interesting quote from that source is:

This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.

So not only may the order be different each time your program runs, but in fact the API itself makes no guarantee that the order will remain constant even inside one run of the program!

"Un-deterministic and a bit weird" is a good description of ordering of a HashMap - and is actually pretty much what the docs say. If you want ordering use either LinkedHashMap or TreeMap . If you don't want ordering then don't worry about it, by having the ordering being effectively random HashMap is giving you extremely fast responses from the methods who's behavior it does guarantee!

In principle there are two contributing factors:

  1. Hashcode of your keys might be non-deterministic, this will be the case when you use default hashCode implementation, which relies on memory location

  2. HashSet itself can be non deterministic, take a look at HashMap.initHashSeedAsNeeded (HashSet uses HashMap in standard Oracle SDK as underlying datastructure), depending on some factors it can use sun.misc.Hashing.randomHashSeed(this) to initialize hashSeed field which is then used when computing hashCode of a key

Randomization can be important to achieve probabilistic performance guaranties. This is what javadoc says for hashSeed:

/** * A randomizing value associated with this instance that is applied to
* hash code of keys to make hash collisions harder to find. If 0 then
* alternative hashing is disabled. */

The order will not change (in practice) unless you add / remove something to your HashSet .

The order is based on the internal hashtable buckets. And that depends on both the hashCode() of an object and the size of the hashtable.

Simplified example:

A's hashcode is 10, B's hashCode is 11. The hastable has size 2. The mapping from hash code to position in hashtable would be purely based on the last bit, ie even hashcodes go into table[0], odd ones into table[1].

table[0] = { A }
table[1] = { B }

Iterating over those values would most likely be A, B now. And that result should be reproducible each time as long as table size stays the same.

Adding a third element C with hashCode 12 would (when not resizing the table) add it to bucket #0 as well.

table[0] = { A, C }
table[1] = { B }

So your iteration would be A, C, B. Or depending in whether you inserted A before C: C, A, B

Adding elements will in practice resize the table and re-hash using an adjusted mapping. Eg table size would be doubled and the last 2 bits could be used to determine the bucket

table[0] = { C }
table[1] = {   }
table[2] = { A }
table[3] = { B }

And the order would have changed completely by adding just 1 element.

Only HashSet keeps and garatuees no order, even no arbitrary order ( Why can hashCode() return the same value for different objects in Java? )! Dont force an order there! Serialize and Deserialize them and the original order will be destroyed.

Use LinkedHashSet instead of HashSet.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM