简体   繁体   中英

Why does a HashSet sort single alphabetic characters?

So what I know is that a HashSet has no real sorting capabilities like a SortedSet, however I stumbled upon this :

When I run the following code :

 public static void main(String[] args) {
    Set<String> collection = new HashSet<String>(2000);
    String[] data = {"a", "c", "g", "f", "b", "f", "b", "d","q","r","d","m"};
    for(String input: data)
    {
        collection.add(input);
    }
    System.out.println("Output: " + collection);
}

I get the following output : Output: [a, b, c, d, f, g, m, q, r]

Which is alphabetically sorted. Why is that? Since a HashSet is not a sorted set.

So I tried with a string of characters instead of a single character :

public static void main(String[] args) {
    Set<String> collection = new HashSet<String>(2000);
    String[] data = {"atjre", "crj", "gertj", "fertj", "berj"};
    for(String input: data)
    {
        collection.add(input);
    }
    System.out.println("Output: " + collection);
}

And i get the following output : Output: [crj, atjre, fertj, gertj, berj]

Now they are not sorted anymore, any explanations for this? Or is this just a random coincidence?

HashSet implements Set interface. It means that there is no guarantee of order of elements.

This class implements the Set interface, backed by a hash table (actually a HashMap instance). It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time. Source

Over the time after you adding, deleting few times you can see the difference.

However, "no guarantee of ordering" does not imply "guaranteed random ordering". Exact answer of your question is,

The hashcode -method of the String class also comes into play here, for single character String s the hashcode will just be the int value of the one char in the String . And since char 's int values are ordered alphabetically, so will the computed hashes of single char String s.

As per the Java docs: https://docs.oracle.com/javase/7/docs/api/java/util/HashSet.html

It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time.

What I think you are experiencing here is a hash-function distribution anomaly. The hash-function is used internally to give your strings an integer index. For 1-long strings there isn't much complexity. As you make your strings longer, your hash function has more to work with.

This stems back to the whole idea of a hash function: take a set of possible values, and map them as evenly as possible to a set of smaller values. It just so happens that the hash function has those strings mapped as it does. You would probably see the same thing with consecutive numbers. And you start to see them un-ordered once more data is introduced.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM