简体   繁体   中英

String IdentityHashMap vs HashMap performance

Identity HashMap is special implementation in Java which compares the objects reference instead of equals() and also uses identityHashCode() instead of hashCode() . In addition, it uses linear-probe hash table instead of Entry list .

Map<String, String> map = new HashMap<>(); 
Map<String, String> iMap = new IdentityHashMap<>();

Does that mean for the String keys IdentifyHashMap will be usually faster if tune correctly ?

See this example:

public class Dictionary {

    public static void main(String[] args) throws IOException {
        BufferedReader br = new BufferedReader(new FileReader("/usr/share/dict/words"));

        String line;
        ArrayList<String> list = new ArrayList<String>();

        while ((line = br.readLine()) != null) {
            list.add(line);
        }
        System.out.println("list.size() = " + list.size());
        Map<String, Integer> iMap = new IdentityHashMap<>(list.size());
        Map<String, Integer> hashMap = new HashMap<>(list.size());

        long iMapTime = 0, hashMapTime = 0;

        long time;
        for (int i = 0; i < list.size(); i++) {
            time = System.currentTimeMillis();
            iMap.put(list.get(i), i);
            time = System.currentTimeMillis() - time;
            iMapTime += time;
            time = System.currentTimeMillis();
            hashMap.put(list.get(i), i);
            time = System.currentTimeMillis() - time;
            hashMapTime += time;
        }

        System.out.println("iMapTime = " + iMapTime + " hashMapTime = " + hashMapTime);
    }

}

Tried very basic performance check. I am reading dictionary words (235K) & pushing into the both maps. It prints:

list.size() = 235886
iMapTime = 101 hashMapTime = 617 

I think this is very good improvment to ignore, unless I am doing something wrong here.

You will see significantly faster performance on IdentityHashMap, however that comes at a substantial cost.

You must be absolutely sure that you will never ever have objects added to the map that have the same value but different identities.

That's hard to guarantee both now and for the future, and a lot of people make mistaken assumptions.

For example

String t1 = "test";
String t2 = "test";

t1==t2 will return true.

String t1 = "test";
String t2 = new String("test");

t1==t2 will return false.

Overall my recommendation is that unless you absolutely critically need the performance boost and know exactly what you are doing and heavily lock down and comment access to the class then by using IdentityHashMap you are opening yourself up to massive risks of very hard to track down bugs in the future.

How does IdentityHashMap<String,?> work?

To make IdentityHashMap<String,?> work for arbitrary strings, you'll have to String.intern() both the keys you put() and potential keys you pass to get() . (Or use an equivalent mechanism.)

Note: unlike stated in @m3th0dman's answer, you don't need to intern() the values.

Either way, interning a string ultimately requires looking it up in some kind of hash table of already interned strings. So unless you had to intern your strings for some other reason anyway (and thus already paid the cost), you won't get much of an actual performance boost out of this.

So why does the test show that you can?

Where your test is unrealistic is that you keep the exact list of keys you used with put() and you iterate across them one by one in list order. Note (the same could be achieved by inserting the elements into a LinkedHashMap and simply calling iterator() on its entry set.

What's the point of IdentityHashMap then?

There are scenarios where it is guaranteed (or practically guaranteed) that object identity is the same as equals() . Imagine trying to implement your own ThreadLocal class for example, you'll probably write something like this:

public final class ThreadLocal<T> {
   private final IdentityHashMap<Thread,T> valueMap;
   ...
   public T get() {
       return valueMap.get( Thread.currentThread() );
   }
}

Because you know that threads have no notion of equality beyond identity. Same goes if your map keys are enum values and so on.

Technically you can do something like this to make sure you have the same instance of the string representation:

public class StringIdentityHashMap extends IdentityHashMap<String, String>
{
    @Override
    public String put(String key, String value)
    {
        return super.put(key.intern(), value.intern());
    }

    @Override
    public void putAll(Map<? extends String, ? extends String> m)
    {
        m.entrySet().forEach(entry -> put(entry.getKey().intern(), entry.getValue().intern()));
    }

    @Override 
    public String get(Object key)
    {
        if (!(key instanceof String)) {
            throw new IllegalArgumentException();
        }
        return super.get(((String) key).intern());
    }

    //implement the rest of the methods in the same way
}

But this won't help you very much since intern() calls equals() to make sure the given String exists or not in the String pool so you end up with the performance of the typical HashMap .

This, however will only help you to improve memory and not CPU. There is no way to achieve better CPU usage and to be sure your program is correct (without possible using some internal knowledge of JVM which might change) because Strings can be in String pool or not and you cannot know if they are in without (not implicitly) calling equals() .

Interestingly, IdentityHashMap can be SLOWER. I am using Class objects as keys, and seeing a ~50% performance INCREASE with HashMap over IdentityHashMap.

IdentityHashMap and HashMap are different internally, so if the equals() method of your keys is really fast, HashMap seems better.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM