简体   繁体   English

Hashmap 并发问题

[英]Hashmap concurrency issue

I have a Hashmap that, for speed reasons, I would like to not require locking on.我有一个 Hashmap,出于速度原因,我不想要求锁定。 Will updating it and accessing it at the same time cause any issues, assuming I don't mind stale data?假设我不介意陈旧的数据,更新它并同时访问它会导致任何问题吗?

My accesses are gets, not iterating through it, and deletes are part of the updates.我的访问是获取,而不是遍历它,删除是更新的一部分。

Yes, it will cause major problems.是的,这会导致重大问题。 One example is what could happen when adding a value to the hash map: this can cause a rehash of the table, and if that occurs while another thread is iterating over a collision list (a hash table "bucket"), that thread could erroneously fail to find a key that exists in the map. HashMap is explicitly unsafe for concurrent use.一个例子是向 hash map 添加值时可能发生的情况:这可能会导致表的重新散列,如果在另一个线程迭代冲突列表(hash 表“桶”)时发生这种情况,则该线程可能会错误地无法找到 map 中存在的密钥HashMap对于并发使用来说显然是不安全的。

Use ConcurrentHashMap instead.请改用ConcurrentHashMap

The importance of synchronising or using ConcurrentHashMap can not be understated.不能低估同步或使用ConcurrentHashMap的重要性。

I was under the misguided impression up until a couple of years ago that I could get away with only synchronising the put and remove operations on a HashMap. This is of course very dangerous and actually results in an infinite loop in HashMap.get() on some (early 1.5 I think) jdk's.直到几年前,我一直被误导,认为我只能在 HashMap 上同步放置和删除操作。这当然非常危险,实际上会导致 HashMap.get() 中的无限循环一些(我认为早于 1.5)jdk 的。

What I did a couple of years ago (and really shouldn't be done):几年前我做了什么(真的不应该这样做):

public MyCache {
    private Map<String,Object> map = new HashMap<String,Object>();

    public synchronzied put(String key, Object value){
        map.put(key,value);
    }

    public Object get(String key){
        // can cause in an infinite loop in some JDKs!!
        return map.get(key);
    }
}

EDIT : thought I'd add an example of what not to do (see above)编辑:以为我会添加一个不该做的例子(见上文)

When in doubt, check the class's Javadocs :如有疑问,请检查类的Javadocs

Note that this implementation is not synchronized.请注意,此实现不是同步的。 If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally.如果多个线程同时访问一个hash map,并且至少有一个线程在结构上修改了map,则必须进行外部同步。 (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the map. If no such object exists, the map should be "wrapped" using the Collections.synchronizedMap method. (结构修改是添加或删除一个或多个映射的任何操作;仅更改与实例已包含的键相关联的值不是结构修改。)这通常是通过同步某些自然封装 map 的 object 来实现的. 如果不存在这样的 object,则应使用 Collections.synchronizedMap 方法“包装”map。 This is best done at creation time, to prevent accidental unsynchronized access to the map:这最好在创建时完成,以防止对 map 的意外不同步访问:

Map m = Collections.synchronizedMap(new HashMap(...));

(emphasis not mine) (重点不是我的)

So based on the fact that you said that your threads will be deleting mappings from the Map, the answer is that yes it will definitely cause issue and yes it is definitely unsafe .因此,基于您所说的您的线程将从 Map 中删除映射这一事实,答案是肯定的,它肯定会导致问题,是的,它绝对是不安全的。

Yes.是的。 Very Bad Things will happen. 非常糟糕的事情将会发生。 For example, your thread might get stuck in an infinite loop.例如,您的线程可能陷入无限循环。

Either use ConcurrentHashMap , or NonBlockingHashMap使用ConcurrentHashMapNonBlockingHashMap

The conditions you describe will not be satisfied by HashMap .你描述的条件不会被HashMap满足。 Since the process of updating a map is not atomic you may encounter the map in an invalid state. Multiple writes might leave it in a corrupted state. ConcurrentHashMap (1.5 or later) does what you want.由于更新 map 的过程不是原子的,您可能会在无效的 state 中遇到 map。多次写入可能会使它留在损坏的 state 中。ConcurrentHashMap (1.5 或更高版本)可以满足您的需求。

If by 'at the same time' you mean from multiple threads, then yes you need to lock access to it (Or use ConcurrentHashMap or similar that does the locking for you).如果“同时”是指来自多个线程,那么是的,您需要锁定对它的访问(或者使用 ConcurrentHashMap 或类似的为您锁定)。

No, there will be no issues if you do the following:不,如果您执行以下操作,则不会有任何问题:

  1. Place your data into the HashMap on the first load of a single thread before any multithreading occurs.在发生任何多线程之前,在单个线程的第一次加载时将您的数据放入 HashMap。 This is because the process of adding data alters the modcount and is different on the first time you add it (a null will be returned) vs. replacing the data (the old data will be returned, but the modcount will not be altered).这是因为添加数据的过程会改变 modcount,并且在您第一次添加它时(将返回 null)与替换数据(将返回旧数据,但不会更改 modcount)是不同的。 Modcount is what makes iterators fail-fast. Modcount 是使迭代器快速失败的原因。 If you're using get, though, nothing will be iterated on, so it's fine.但是,如果您使用的是 get,则不会迭代任何内容,所以没问题。

  2. Have the same keys throughout your application.在整个应用程序中使用相同的密钥。 Once the application starts and loads its data, no other keys can be assigned to this map. This way a get will either get stale data or data that was inserted fresh - there will be no issues.一旦应用程序启动并加载其数据,就不能为这个 map 分配其他键。这样 get 将获取陈旧数据或新插入的数据 - 不会有任何问题。

Like others mentionned use a ConcurrentHashMap or synchronize the map when updating it.像其他人提到的那样,在更新时使用 ConcurrentHashMap 或同步 map。

I read here or elsewhere, no, you don't access from multi thread, but noone says what's really happen.我在这里或其他地方读到,不,你不能从多线程访问,但没有人说到底发生了什么。

So, I seen today (that's why I'm on this - old - question) on a application running in production since March: 2 put on the same HashSet (then HashMap) cause a CPU overload (near 100%), and memory increasing of 3GB, then down by GC.所以,我今天看到(这就是为什么我在这个 - 老 - 问题上)自 3 月以来在生产中运行的应用程序:2 放在相同的 HashSet(然后是 HashMap)导致 CPU 过载(接近 100%),并且 memory 增加3GB,然后由 GC 降低。 We have to restart the app.我们必须重新启动应用程序。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM