简体   繁体   English

在多线程环境中使用HashMap

[英]Using HashMap in multithreaded environment

I was going through an interview question on JavaRevisited and I'm having difficulty understanding this question :我正在接受关于 JavaRevisited的面试问题,但我很难理解这个问题:

What's wrong with using a HashMap in a multithreaded environment?在多线程环境中使用 HashMap 有什么问题? When get() method go into an infinite loop?当 get() 方法进入无限循环时?

In my opinion, it's not a problem to use HashMap inside a multi-threaded environment, as long as our application is not accessing/reading threads which are modifying the created HashMap , rather than simply accessing the HashMap.在我看来,在多线程环境中使用HashMap不是问题,只要我们的应用程序不访问/读取正在修改创建的HashMap线程,而不是简单地访问 HashMap。

So, as I see it, there's not a problem as long as in the application we are just accessing the HashMap in a multi-threaded environment.所以,在我看来,只要在应用程序中我们只是在多线程环境中访问HashMap就没有问题。

Please let me know if my understanding is correct.请让我知道我的理解是否正确。

What's wrong using HashMap in multithreaded environment?在多线程环境中使用 HashMap 有什么问题? When get() method go to infinite loop?当 get() 方法进入无限循环?

It is a bug to have multiple threads use a non-synchronized collection (really any mutable class) in an unprotected manner.让多个线程以不受保护的方式使用非同步集合(实际上是任何可变类)是一个错误。 Certain if each thread had their own HashMap instance then this is not an issue.如果每个线程都有自己的HashMap实例,那么这不是问题。 It is a problem if multiple threads are adding to the same HashMap instance without it being synchronized .如果多个线程在没有synchronized情况下添加到同一个HashMap实例,这是一个问题。 Even if just 1 thread is modifying a HashMap and other threads are reading from that same map without synchronization, you will run into problems.即使只有 1 个线程正在修改HashMap而其他线程在没有同步的情况下从同一个映射中读取,您也会遇到问题。

If you need to use the same hash table object in multiple threads then you should consider using ConcurrentHashMap , wrapping each of the accesses to the HashMap in a synchronized {} block, or making use of the Collections.synchronizedMap(new HashMap<...>()) construct.如果您需要在多个线程中使用相同的哈希表对象,那么您应该考虑使用ConcurrentHashMap ,将每个对HashMap的访问包装在一个synchronized {}块中,或者使用Collections.synchronizedMap(new HashMap<...>())构造。

Chances are that the get() goes to an infinite loop because one of the threads has only a partially updated view of the HashMap in memory and there must be some sort of object reference loop.有可能get()进入无限循环,因为其中一个线程在内存中只有部分更新HashMap视图,并且必须存在某种对象引用循环。 That's the peril of using an unsynchronized collection with multiple threads.这是使用具有多个线程的非同步集合的危险。

So in my understanding, it's not a problem as long as in the application we are just accessing the HashMap in a multi-threaded environment?所以在我的理解中,只要在应用程序中我们只是在多线程环境中访问HashMap就不是问题吗?

If by "accessing" you mean "reading", then this is true with qualifications .如果“访问”的意思是“阅读”,那么这就是资格 You must make sure:您必须确保:

  • All of the updates to the HashMap are completed before the threads are instantiated and the thread that creates the map also forks the threadsHashMap所有更新都线程被实例化之前完成并且创建映射的线程也分叉了线程
  • The threads are only using the HashMap in read-only mode – either get() or iteration without remove线程仅在只读模式下使用HashMap无论是get()还是没有删除的迭代
  • There are no threads updating the map没有线程更新地图

If any of these conditions are not true then you will need to use a synchronized map instead.如果这些条件中的任何一个不为真,那么您将需要使用同步映射来代替。

This is a classical question.这是一个经典的问题。 ArrayList and HashMap are not synchronized, while Vector and HashTable are. ArrayList 和 HashMap 不同步,而 Vector 和 HashTable 是同步的。 You should therefore use HashTable unless you are very careful defining mutexes yourself.因此,除非您自己非常小心地定义互斥锁,否则您应该使用 HashTable。

In other words, the methods in eg HashTable will ensure that no other thread is working with the HashTable at any given time.换句话说,例如 HashTable 中的方法将确保在任何给定时间没有其他线程正在使用 HashTable。 If you use a HashMap, you'd have to do that manually by ensuring that you synchronize on HashMap before you call the method.如果您使用 HashMap,则必须通过确保在调用该方法之前在 HashMap 上进行同步来手动执行此操作。

Update: checkout @Gray's comment.更新:结帐@Gray 的评论。 It looks like wrapping HashMap with Collections.synchronizedMap(new HashMap()) is the way to go now.看起来用 Collections.synchronizedMap(new HashMap()) 包装 HashMap 是现在要走的路。

EDIT: other posters have answered way better than I did.编辑:其他海报的回答比我好。 My answer, however, generated an interesting discussion on the use of the soon to be deprecated Vector, Stack, Hashtable and Dictionary classes, so I'm leaving the question here, as a head to the comments below.然而,我的回答引发了关于使用即将弃用的 Vector、Stack、Hashtable 和 Dictionary 类的有趣讨论,因此我将问题留在这里,作为下面评论的开头。 Thanks guys!谢谢你们!

We know that HashMap is a non-synchronized collection whereas its synchronized counter-part is HashTable .我们知道HashMap是一个非同步集合,而它的同步对应部分是HashTable So, when you are accessing the collection in a multithreaded environment and all threads are accessing a single instance of collection, then it's safer to use HashTable for various obvious reasons eg to avoid dirty reads and to maintain data consistency.因此,当您在多线程环境中访问集合并且所有线程都访问集合的单个实例时,出于各种明显的原因(例如避免脏读和保持数据一致性),使用HashTable更安全。 In the worst case, this multithreaded environment can result in an infinite loop as well.在最坏的情况下,这种多线程环境也会导致无限循环。

Yes, it is true.是的,它是真的。 HashMap.get() can cause an infinite loop. HashMap.get()会导致无限循环。 Let us see how??让我们看看怎么样??

If you look at the source code HashMap.get(Object key) method, it looks like this:如果查看源代码HashMap.get(Object key)方法,它看起来像这样:

 public Object get(Object key) {
    Object k = maskNull(key);
    int hash = hash(k);
    int i = indexFor(hash, table.length);
    Entry e = table[i];
    while (true) {
        if (e == null)
            return e;
        if (e.hash == hash &amp;&amp; eq(k, e.key))
            return e.value;
        e = e.next;
    }
}

while(true){...} can always be a victim of an infinite loop at runtime in a multithreaded environment, IF, somehow e.next can point to itself. while(true){...}总是可以在多线程环境中运行时成为无限循环的牺牲品,如果,不知何故 e.next 可以指向自身。 This will result in an infinite loop.这将导致无限循环。 But, how e.next will point to itself?但是,e.next 将如何指向自身?

This can happen in void transfer(Entry[] newTable) method, which is invoked at the time the HashMap resizing is done.这可能发生在void transfer(Entry[] newTable)方法中,该方法在 HashMap 调整大小完成时调用。

    do {
        Entry next = e.next;
        int i = indexFor(e.hash, newCapacity);
        e.next = newTable[i];
    newTable[i] = e;
        e
= next;
} while (e != null);

This piece of code is prone to produce the above condition if resizing happens and at the same time, other threads tried to modify the map instance.这段代码很容易在调整大小的情况下产生上述情况,同时其他线程试图修改地图实例。

The only way to avoid this scenario is to use synchronization in code, or better, use the synchronized collection.避免这种情况的唯一方法是在代码中使用同步,或者更好的是使用同步集合。

I guess that they meant to access to a shared copy of HashMap .我猜他们打算访问HashMap的共享副本。 Shared mutable state . Shared mutable state

Since it is not synchronized every thread will grab its copy from the main memory, modify, and overwrite it.因为它不是synchronized每个线程都会从主内存中获取它的副本,修改并覆盖它。

HashMap with one entry <n, 1>

thread 1 grab the copy

thread 2 grab the copy

thread 1 modify <n, 2>

thread 2 modify <n, 3>

thread 1 is done, and stores the copy in the main memory

now memory is <n, 2>

thread 2 is done and stores the copy

now memory is <n, 3>

The state thread 1 is lost

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM