简体   繁体   中英

Using HashMap in multithreaded environment

I was going through an interview question on JavaRevisited and I'm having difficulty understanding this question :

What's wrong with using a HashMap in a multithreaded environment? When get() method go into an infinite loop?

In my opinion, it's not a problem to use HashMap inside a multi-threaded environment, as long as our application is not accessing/reading threads which are modifying the created HashMap , rather than simply accessing the HashMap.

So, as I see it, there's not a problem as long as in the application we are just accessing the HashMap in a multi-threaded environment.

Please let me know if my understanding is correct.

What's wrong using HashMap in multithreaded environment? When get() method go to infinite loop?

It is a bug to have multiple threads use a non-synchronized collection (really any mutable class) in an unprotected manner. Certain if each thread had their own HashMap instance then this is not an issue. It is a problem if multiple threads are adding to the same HashMap instance without it being synchronized . Even if just 1 thread is modifying a HashMap and other threads are reading from that same map without synchronization, you will run into problems.

If you need to use the same hash table object in multiple threads then you should consider using ConcurrentHashMap , wrapping each of the accesses to the HashMap in a synchronized {} block, or making use of the Collections.synchronizedMap(new HashMap<...>()) construct.

Chances are that the get() goes to an infinite loop because one of the threads has only a partially updated view of the HashMap in memory and there must be some sort of object reference loop. That's the peril of using an unsynchronized collection with multiple threads.

So in my understanding, it's not a problem as long as in the application we are just accessing the HashMap in a multi-threaded environment?

If by "accessing" you mean "reading", then this is true with qualifications . You must make sure:

  • All of the updates to the HashMap are completed before the threads are instantiated and the thread that creates the map also forks the threads
  • The threads are only using the HashMap in read-only mode – either get() or iteration without remove
  • There are no threads updating the map

If any of these conditions are not true then you will need to use a synchronized map instead.

This is a classical question. ArrayList and HashMap are not synchronized, while Vector and HashTable are. You should therefore use HashTable unless you are very careful defining mutexes yourself.

In other words, the methods in eg HashTable will ensure that no other thread is working with the HashTable at any given time. If you use a HashMap, you'd have to do that manually by ensuring that you synchronize on HashMap before you call the method.

Update: checkout @Gray's comment. It looks like wrapping HashMap with Collections.synchronizedMap(new HashMap()) is the way to go now.

EDIT: other posters have answered way better than I did. My answer, however, generated an interesting discussion on the use of the soon to be deprecated Vector, Stack, Hashtable and Dictionary classes, so I'm leaving the question here, as a head to the comments below. Thanks guys!

We know that HashMap is a non-synchronized collection whereas its synchronized counter-part is HashTable . So, when you are accessing the collection in a multithreaded environment and all threads are accessing a single instance of collection, then it's safer to use HashTable for various obvious reasons eg to avoid dirty reads and to maintain data consistency. In the worst case, this multithreaded environment can result in an infinite loop as well.

Yes, it is true. HashMap.get() can cause an infinite loop. Let us see how??

If you look at the source code HashMap.get(Object key) method, it looks like this:

 public Object get(Object key) {
    Object k = maskNull(key);
    int hash = hash(k);
    int i = indexFor(hash, table.length);
    Entry e = table[i];
    while (true) {
        if (e == null)
            return e;
        if (e.hash == hash &amp;&amp; eq(k, e.key))
            return e.value;
        e = e.next;
    }
}

while(true){...} can always be a victim of an infinite loop at runtime in a multithreaded environment, IF, somehow e.next can point to itself. This will result in an infinite loop. But, how e.next will point to itself?

This can happen in void transfer(Entry[] newTable) method, which is invoked at the time the HashMap resizing is done.

    do {
        Entry next = e.next;
        int i = indexFor(e.hash, newCapacity);
        e.next = newTable[i];
    newTable[i] = e;
        e
= next;
} while (e != null);

This piece of code is prone to produce the above condition if resizing happens and at the same time, other threads tried to modify the map instance.

The only way to avoid this scenario is to use synchronization in code, or better, use the synchronized collection.

I guess that they meant to access to a shared copy of HashMap . Shared mutable state .

Since it is not synchronized every thread will grab its copy from the main memory, modify, and overwrite it.

HashMap with one entry <n, 1>

thread 1 grab the copy

thread 2 grab the copy

thread 1 modify <n, 2>

thread 2 modify <n, 3>

thread 1 is done, and stores the copy in the main memory

now memory is <n, 2>

thread 2 is done and stores the copy

now memory is <n, 3>

The state thread 1 is lost

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM